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5 SYSTEM FOR IN VITRO TRANSPOSITION USING MODIFIED TN5 TRANSPOSASE 

CROSS-REFERENCE TO RELATED APPLICATION 
This patent application is a continuation-in-part of a 
patent application entitled "System for In Vitro 
Transposition," filed March 11, 1997, for which no serial 
10 number has yet been accorded. Applicants have petitioned for a 

filing date of September 9, 1996 to be accorded to the parent 
application . 

STATEMENT REGARDING FEDERALLY SPONSORED RESEARCH OR DEVELOPMENT 
Not applicable. 

15 BACKGROUND OF THE INVENTION 

The present invention relates generally to the field of 
transposable nucleic acid and, more particularly to production 
and use of a modified transposase enzyme in a system for 
introducing genetic changes to nucleic acid. 

20 Transposable genetic elements are DNA sequences, found in 

a wide variety of prokaryotic and eukaryotic organisms, that 
can move or transpose from one position to another position in 
a genome. In vivo, intra- chromosomal transpositions as well as 
transpositions between chromosomal and non-chromosomal genetic 

2 5 material are known. In several systems, transposition is known 

to be under the control of a transposase enzyme that is 
typically encoded by the transposable element. The genetic 
structures and transposition mechanisms of various transposable 
elements are summarized, for example, in "Transposable Genetic 

3 0 Elements" in "The Encyclopedia of Molecular Biology, 11 Kendrew 

and Lawrence, Eds., Blackwell Science, Ltd., Oxford (1994), 
incorporated herein by reference. 

In vitro transposition systems that utilize the particular 
transposable elements of bacteriophage Mu and bacterial 
3 5 transposon TnlO have been described, by the research groups of 
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Kiyoshi Mizuuchi and Nancy Kleckner, respectively. 

The bacteriophage Mu system was first described by 
Mizuuchi, K. , " In Vitro Transposition of Bacteria Phage Mu: A 
Biochemical Approach to a Novel Replication Reaction, " 
Call: 785-794 (1983) and Craigie, R. et al . , "A Defined System 
for the DNA Strand-Transfer Reaction at the Initiation of 
Bacteriophage Mu Transposition: Protein and DNA Substrate 
Requirements," P.N.A.fi. U.S.A. 82:7570-7574 (1985). The DNA 
donor substrate (mini-Mu) for Mu in vitro reaction normally 
requires six Mu transposase binding sites (three of about 3 0 bp 
at each end) and an enhancer sequence located about 1 kb from 
the left end. The donor plasmid must be supercoiled. Proteins 
required are Mu-encoded A and B proteins and host -encoded HU 
and IHF proteins. Lavoie, B.D, and G. Chaconas, "Transposition 
of phage Mu DNA, " Curr . Top ics Microbiol, Immunol, 204:83-99 
(1995) . The Mu-based system is disfavored for in vitro 
transposition system applications because the Mu termini are 
complex and sophisticated and because transposition requires 
additional proteins above and beyond the transposase. 

The TnlO system was described by Morisato, D. and N. 
Kleckner, "TnlO Transposition and Circle Formation in vitro," 
Cell 51:101-111 (1987) and by Benjamin, H. W. and N. Kleckner, 
"Excision Of TnlO from the Donor Site During Transposition 
Occurs By Flush Double -Strand Cleavages at the Transposon 
Termini," P.N.A.S. U.S.A. 89:4648-4652 (1992). The TnlO system 
involves the a supercoiled circular DNA molecule carrying the 
transposable element (or a linear DNA molecule plus E . coli IHF 
protein) . The transposable element is defined by complex 42 bp 
terminal sequences with IHF binding site adjacent to the 
inverted repeat. In fact, even longer (81 bp) ends of TnlO 
were used in reported experiments. Sakai , J. et al . , 
"Identification and Characterization of Pre-Cleavage Synaptic 
Complex that is an Early Intermediate in TnlO transposition," 
E.M.B.O. J. 14:4374-4383 (1995). In the TnlO system, chemical 
treatment of the transposase protein is essential to support 
active transposition. In addition, the termini of the TnlO 
element limit its utility in a generalized in vitro 

-2- 
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5 transposition system. 

Both the Mu- and TnlO-based in vitro transposition systems 
are further limited in that they are active only on covalently 
closed circular, supercoiled DNA targets. What is desired is a 
more broadly applicable in vitro transposition system that 
10 utilizes shorter, more well defined termini and which is active 

on target DNA of any structure (linear, relaxed circular, and 
supercoiled circular DNA) . 

BRIEF SUMMARY OF THE INVENTION 
The present invention is summarized in that an in vitro 

15 transposition system comprises a preparation of a suitably 

modified transposase of bacterial transposon Tn5, a donor DNA 
molecule that includes a transposable element, a target DNA 
molecule into which the transposable element can transpose, all 
provided in a suitable reaction buffer. 

20 The transposable element of the donor DNA molecule is 

characterized as a transposable DNA sequence of interest, the 
DNA sequence of interest being flanked at its 5 ! - and 3 1 -ends 
by short repeat sequences that are acted upon in trans by Tn5 
transposase . 

25 The invention is further summarized in that the suitably 

modified transposase enzyme comprises two classes of 
differences from wild type Tn5 transposase, where each class 
has a separate measurable effect upon the overall transposition 
activity of the enzyme and where a greater effect is observed 

30 when both modifications are present. The suitably modified 

enzyme both (1) binds to the repeat sequences of the donor DNA 
with greater avidity than wild type Tn5 transposase ("class (1) 
mutation") and (2) is less likely than the wild type protein to 
assume an inactive multimeric form ("class (2) mutation"). A 

35 suitably modified Tn5 transposase of the present invention that 

contains both class (1) and class (2) modifications induces at 
least about 100-fold (±10%) more transposition than the wild 
type enzyme, when tested in combination in an in vivo 
conjugation assay as described by Weinreich, M.D., "Evidence 

4 0 that the cis Preference of the Tn5 Transposase is Caused by 
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Nonproductive Multimerization, " C?enes ar<1 Pevelopment 8:2363- 
2374 (1994), incorporated herein by reference. Under optimal 
conditions, transposition using the modified transposase may be 
higher. A modified transposase containing only a class (1) 
mutation binds to the repeat sequences with sufficiently 
greater avidity than the wild type Tn5 transposase that such a 
Tn5 transposase induces about 5- to 50-fold more transposition 
than the wild type enzyme, when measured in vivo. A modified 
transposase containing only a class (2) mutation is 
sufficiently less likely than the wild type Tn5 transposase to 
assume the multimeric form that such a Tn5 transposase also 
induces about 5- to 50-fold more transposition than the wild 
type enzyme, when measured in vivo. 

In another aspect, the invention is summarized in that a 
method for transposing the transposable element from the donor 
DNA into the target DNA in vitro includes the steps of mixing 
together the suitably modified Tn5 transposase protein, the 
donor DNA, and the target DNA in a suitable reaction buffer, 
allowing the enzyme to bind to the flanking repeat sequences of 
the donor DNA at a temperature greater than 0°C, but no higher 
than about 2 8°C, and then raising the temperature to 
physiological temperature (about 3 7°C) whereupon cleavage and 
strand transfer can occur. 

It is an object of the present invention to provide a 
useful in vitro transposition system having few structural 
requirements and high efficiency. 

It is another object of the present invention to provide a 
method that can be broadly applied in various ways, such as to 
create absolute defective mutants, to provide selective markers 
to target DNA, to provide portable regions of homology to a 
target DNA, to facilitate insertion of specialized DNA 
sequences into target DNA, to provide primer binding sites or 
tags for DNA sequencing, to facilitate production of genetic 
fusions for gene expression studies and protein domain mapping, 
as well as to bring together other desired combinations of DNA 
sequences (combinatorial genetics) . 

It is a feature of the present invention that the modified 

-4- 
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5 transposase enzyme binds more tightly to DNA than does wild 

type Tn5 transposase. 

It is an advantage of the present invention that the 

modified transposase facilitates in vitro transposition 

reaction rates of at least about 100 -fold higher than can be 
10 achieved using wild type transposase (as measured in vivo) . It 

is noted that the wild-type Tn5 transposase shows no detectable 

in vitro activity in the system of the present invention. 

Thus, while it is difficult to calculate an upper limit to the 

increase in activity, it is clear that hundreds, if not 
15 thousands, of colonies are observed when the products of in 

vitro transposition are assayed in vivo. 

It is another advantage of the present invention that in 

vitro transposition using this system can utilize donor DNA and 

target DNA that is circular or linear. 
20 It is yet another advantage of the present invention that 

in vitro transposition using this system requires no outside 

high energy source and no other protein other than the modified 

transposase . 

Other objects, features, and advantages of the present 
25 invention will become apparent upon consideration of the 

following detailed description. 

BRIEF DESCRIPTION OF THE SEVERAL VIEWS OF THE DRAWINGS 
Fig. 1 depicts test plasmid pRZTLl, used herein to 
demonstrate transposition in vitro of a transposable element 
30 located between a pair of Tn5 outside end termini. Plasmid 

pRZTLl is also shown and described in SEQ ID NO : 3 . 

Fig. 2 depicts an electrophoretic analysis of plasmid 
pRZTLl before and after in vitro transposition. Data obtained 
using both circular and linear plasmid substrates are shown. 
35 Fig. 3 is an electrophoretic analysis of plasmid pRZTLl 

after in vitro transposition, including further analysis of the 
molecular species obtained using circular and linear plasmid 
substrates . 

Fig. 4 shows plasmids pRZ1496, pRZ5451 and pRZTLl, which 
40 are detailed in the specification. 
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Fig. 5 shows a plot of papillae per colony over time for 
various mutant OE sequences tested in vivo against EK54/MA56 
transposase . 

Fig. 6 shows a plot of papillae per colony over time for 
various mutant OE sequences with a smaller Y-axis than is shown 
in Fig. 5 tested against EK54/MA56 transposase. 

Fig. 7 shows a plot of papillae per colony over time for 
various mutant OE sequences tested against MA5 6 Tn5 
transposase . 

Fig. 8 shows in vivo transposition using two preferred 
mutants, tested against MA56 and EK54/MA56 transposase. 

DETAILED DESCRIPTION OF THE INVENTION 
It will be appreciated that this technique provides a 
simple, in vitro system for introducing any transposable 
element from a donor DNA into a target DNA. It is generally 
accepted and understood that Tn5 transposition requires only a 
pair of OE termini, located to either side of the transposable 
element. These OE termini are generally thought to be 18 or 19 
bases in length and are inverted repeats relative to one 
another. Johnson, R. C, and W. S. Reznikoff, Nature 3 04:280 
(1983), incorporated herein by reference. The Tn5 inverted 
repeat sequences, which are referred to as "termini" even 
though they need not be at the termini of the donor DNA 
molecule, are well known and understood. 

Apart from the need to flank the desired transposable 
element with standard Tn5 outside end ("OE") termini, few other 
requirements on either the donor DNA or the target DNA are 
envisioned. It is thought that Tn5 has few, if any, 
preferences for insertion sites, so it is possible to use the 
system to introduce desired sequences at random into target 
DNA. Therefore, it is believed that this method, employing the 
modified transposase described herein and a simple donor DNA, 
is broadly applicable to introduce changes into any target DNA, 
without regard to its nucleotide sequence. It will, thus, be 
applied to many problems of interest to those skilled in the 
art of molecular biology. 

-6- 
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5 In the method, the modified transposase protein is 

combined in a suitable reaction buffer with the donor DNA and 
the target DNA . A suitable reaction buffer permits the 
transposition reaction to occur. A preferred, but not 
necessarily optimized, buffer contains spermidine to condense 

10 the DNA, glutamate, and magnesium, as well as a detergent, 

which is preferably 3 - [ (3 -cholamidopropyl) dimethyl - ammonio] -1- 
propane sulfonate ("CHAPS"). The mixture can be incubated at a 
temperature greater than 0°C and as high as about 2 8°C to 
facilitate binding of the enzyme to the OE termini. Under the 

15 buffer conditions used by the inventors in the Examples, a 

pretreatment temperature of 30°C was not adequate. A preferred 
temperature range is between 16 °C and 2 8°C. A most preferred 
pretreatment temperature is about 20°C. Under different buffer 
conditions, however, it may be possible to use other below - 

20 physiological temperatures for the binding step. After a short 

pretreatment period of time (which has not been optimized, but 
which may be as little as 30 minutes or as much as 2 hours, and 
is typically 1 hour) , the reaction mixture is diluted with 2 
volumes of a suitable reaction buffer and shifted to 

25 physiological conditions for several more hours (say 2-3 hours) 

to permit cleavage and strand transfer to occur. A temperature 
of 3 7°C, or thereabouts, is adequate. After about 3 hours, the 
rate of transposition decreases markedly. The reaction can be 
stopped by phenol -chloroform extraction and can then be 

30 desalted by ethanol precipitation. 

When the DNA has been purified using conventional 
purification tools, it is possible to employ simpler reaction 
conditions in the in vitro transposition method. DNA of 
sufficiently high purity can be prepared by passing the DNA 

35 preparation through a resin of the type now commonly used in 

the molecular biology laboratory, such as the Qiagen resin of 
the Qiagen plasmid purification kit (Catalog No. 12162) . When 
such higher quality DNA is employed, CHAPS can be omitted from 
the reaction buffer. When CHAPS is eliminated from the 

40 reaction buffer, the reactants need not be diluted in the 

manner described above. Also, the low temperature incubation 
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step noted above can be eliminated in favor of a single 
incubation for cleavage and strand transfer at physiological 
conditions. A three hour incubation at 37°C is sufficient. 

Following the reaction and subsequent extraction steps, 
transposition can be assayed by introducing the nucleic acid 
reaction products into suitable bacterial host cells (e.g., E . 
coli K-12 DH5a cells (recA") ; commercially available from Life 
Technologies (Gibco-BRL) ) preferably by electroporation, 
described by Dower et al . , Mnr. Acids. Res. 16:6127 (1988), and 
monitoring for evidence of transposition, as is described 
elsewhere herein. 

Those persons skilled in the art will appreciate that 
apart from the changes noted herein, the transposition reaction 
can proceed under much the same conditions as would be found in 
an in vivo reaction. Yet, the modified transposase described 
herein so increases the level of transposition activity that it 
is now possible to carry out this reaction in vitro where this 
has not previously been possible. The rates of reaction are 
even greater when the modified transposase is coupled with an 
optimized buffer and temperature conditions noted herein. 

In another aspect, the present invention is a preparation 
of a modified Tn5 transposase enzyme that differs from wild 
type Tn5 transposase in that it (1) binds to the repeat 
sequences of the donor DNA with greater avidity than wild type 
Tn5 transposase and (2) is less likely than the wild type 
protein to assume an inactive multimeric form. An enzyme 
having these requirements can be obtained from a bacterial host 
cell containing an expressible gene for the modified enzyme 
that is under the control of a promoter active in the host 
cell. Genetic material that encodes the modified Tn5 
transposase can be introduced (e.g., by electroporation) into 
suitable bacterial host cells capable of supporting expression 
of the genetic material. Known methods for overproducing and 
preparing other Tn5 transposase mutants are suitably employed. 
For example, Weinreich, M. D., et al., supra, describes a 
suitable method for overproducing a Tn5 transposase. A second 
method for purifying Tn5 transposase was described in de la 

-8- 
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5 Cruz, N. B., et al . , "Characterization of the Tn5 Transposase 

and Inhibitor Proteins: A Model for the Inhibition of 
Transposition, " J. Bact. 175:6932-6938 (1993), also 
incorporated herein by reference. It is noted that induction 
can be carried out at temperatures below 37°C, which is the 

10 temperature used by de la Cruz, et al . Temperatures at least in 

the range of 33 to 37°C are suitable. The inventors have 
determined that the method for preparing the modified 
transposase of the present invention is not critical to success 
of the method, as various preparation strategies have been used 

15 with equal success. 

Alternatively, the protein can be chemically synthesized, 
in a manner known to the art, using the amino acid sequence 
attached hereto as SEQ ID NO: 2 as a guide. It is also possible 
to prepare a genetic construct that encodes the modified 

20 protein (and associated transcription and translation signals) 

by using standard recombinant DNA methods familiar to molecular 
biologists. The genetic material useful for preparing such 
constructs can be obtained from existing Tn5 constructs, or can 
be prepared using known methods for introducing mutations into 

25 genetic material (e.g., random mutagenesis PCR or site-directed 

mutagenesis) or some combination of both methods. The genetic 
sequence that encodes the protein shown in SEQ ID NO : 2 is set 
forth in SEQ ID NO:l. 

The nucleic acid and amino acid sequence of wild type Tn5 

30 transposase are known and published. N. C.B.I. Accession Number 

U00004 L19385, incorporated herein by reference. 

In a preferred embodiment, the improved avidity of the 
modified transposase for the repeat sequences for OE termini 
(class (1) mutation) can be achieved by providing a lysine 

35 residue at amino acid 54, which is glutamic acid in wild type 

Tn5 transposase. The mutation strongly alters the preference 
of the transposase for OE termini, as opposed to inside end 
("IE") termini. The higher binding of this mutation, known as 
EK54, to OE termini results in a transposition rate that is 

40 about 10-fold higher than is seen with wild type transposase. 

A similar change at position 54 to valine (mutant EV54) also 
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results in somewhat increased binding/transposition for OE 
termini, as does a threonine-to-proline change at position 47 
(mutant TP47; about 10-fold higher) . It is believed that 
other, comparable transposase mutations (in one or more amino 
acids) that increase binding avidity for OE termini may also be 
obtained which would function as well or better in the in vitro 
assay described herein. 

One of ordinary skill will also appreciate that changes to 
the nucleotide sequences of the short repeat sequences of the 
donor DNA may coordinate with other mutation (s) in or near the 
binding region of the transposase enzyme to achieve the same 
increased binding effect, and the resulting 5- to 50-fold 
increase in transposition rate. Thus, while the applicants 
have exemplified one case of a mutation that improves binding 
of the exemplified transposase, it will be understood that 
other mutations in the transposase, or in the short repeat 
sequences, or in both, will also yield transposases that fall 
within the scope and spirit of the present invention. A 
suitable method for determining the relative avidity for Tn5 OE 
termini has been published by Jilk, R. A., et al . , "The 
Organization of the Outside end of Transposon Tn5, " J , BfrCt * 
178 : 1671-79 (1996) . 

The transposase of the present invention is also less 
likely than the wild type protein to assume an inactive 
multimeric form. In the preferred embodiment, that class (2) 
mutation from wild type can be achieved by modifying amino acid 
372 (leucine) of wild type Tn5 transposase to a proline (and, 
likewise by modifying the corresponding DNA to encode proline) . 
This mutation, referred to as LP372, has previously been 
characterized as a mutation in the dimerization region of the 
transposase. Weinreich, et al . , supra. It was noted by 
Weinreich et al . that this mutation at position 372 maps to a 
region shown previously to be critical for interaction with an 
inhibitor of Tn5 transposition. The inhibitor is a protein 
encoded by the same gene that encodes the transposase, but 
which is truncated at the N- terminal end of the protein, 
relative to the transposase. The approach of Weinreich et al . 

-10- 
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5 for determining the extent to which multimers are formed is 

suitable for determining whether a mutation falls within the 
scope of this element. 

It is thought that when wild type Tn5 transposase 
multimerizes, its activity in trans is reduced. Presumably, a 

10 mutation in the dimerization region reduces or prevents 

multimerization, thereby reducing inhibitory activity and 
leading to levels of transposition 5- to 50 -fold higher than 
are seen with the wild type transposase. The LP372 mutation 
achieves about 10-fold higher transposition levels than wild 

15 type. Likewise, other mutations (including mutations at a one 

or more amino acid) that reduce the ability of the transposase 
to multimerize would also function in the same manner as the 
single mutation at position 372, and would also be suitable in 
a transposase of the present invention. It may also be 

20 possible to reduce the ability of a Tn5 transposase to 

multimerize without altering the wild type sequence in the so- 
called dimerization region, for example by adding into the 
system another protein or non-protein agent that blocks the 
dimerization site. Alternatively, the dimerization region 

25 could be removed entirely from the transposase protein. 

As was noted above, the inhibitor protein, encoded in 
partially overlapping sequence with the transposase, can 
interfere with transposase activity. As such, it is desired 
that the amount of inhibitor protein be reduced over the amount 

30 observed in wild type in vivo. For the present assay, the 

transposase is used in purified form, and it may be possible to 
separate the transposase from the inhibitor (for example, 
according to differences in size) before use. However, it is 
also possible to genetically eliminate the possibility of 

3 5 having any contaminating inhibitor protein present by removing 

its start codon from the gene that encodes the transposase. 

An AUG in the wild type Tn5 transposase gene that encodes 
methionine at transposase amino acid 56 is the first codon of 
the inhibitor protein. However, it has already been shown that 

4 0 replacement of the methionine at position 56 has no apparent 

effect upon the transposase activity, but at the same time 
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prevents translation of the inhibitor protein, thus resulting 
in a somewhat higher transposition rate. Weigand, T . W. and W. 
S. Reznikoff, "Characterization of Two Hypertransposing Tn5 
Mutants," J. Bact. 174:1229-1239 (1992), incorporated herein by 
reference. In particular, the present inventors have replaced 
the methionine with an alanine in the preferred embodiment (and 
have replaced the methionine -encoding AUG codon with an 
alanine-encoding GCC) . A preferred transposase of the present 
invention therefore includes an amino acid other than 
methionine at amino acid position 56, although this change can 
be considered merely technically advantageous (since it ensures 
the absence of the inhibitor from the in vitro system) and not 
essential to the invention (since other means can be used to 
eliminate the inhibitor protein from the in vitro system) . 

The most preferred transposase amino acid sequence known 
to the inventors differs from the wild type at amino acid 
positions 54, 56, and 372. The mutations at positions 54 and 
372 separately contribute approximately a 10-fold increase to 
the rate of transposition reaction in vivo. When the mutations 
are combined using standard recombinant techniques into a 
single molecule containing both classes of mutations, reaction 
rates of at least about 100 -fold higher than can be achieved 
using wild type transposase are observed when the products of 
the in vitro system are tested in vivo. The mutation at 
position 56 does not directly affect the transposase activity. 

Other mutants from wild type that are contemplated to be 
likely to contribute to high transposase activity in vitro 
include, but are not limited to glutaminic acid- to- lysine at 
position 110, and glutamic acid to lysine at position 345. 

It is, of course, understood that other changes apart from 
these noted positions can be made to the modified transposase 
(or to a construct encoding the modified transposase) without 
adversely affecting the transposase activity. For example, it 
is well understood that a construct encoding such a transposase 
could include changes in the third position of codons such that 
the encoded amino acid does not differ from that described 
herein. In addition, certain codon changes have little or no 

-12- 



SUBSTITUTE SHEET (RULE 26) 



WO 98/10077 PCT/US97/15941 

5 functional effect upon the transposition activity ot tne 

encoded protein. Finally, other changes may be introduced 
which provide yet higher transposition activity in the encoded 
protein. It is also specifically envisioned that combinations 
of mutations can be combined to encode a modified transposase 

10 having even higher transposition activity than has been 

exemplified herein. All of these changes are within the scope 
of the present invention. It is noted, however, that a 
modified transposase containing the EK110 and EK345 mutations 
(both described by Weigand and Reznikoff , supra, had lower 

15 transposase activity than a transposase containing either 

mutation alone. 

After the enzyme is prepared and purified, as described 
supra, it can be used in the in vitro transposition reaction 
described above to introduce any desired transposable element 

2 0 from a donor DNA into a target DNA. The donor DNA can be 

circular or can be linear. If the donor DNA is linear, it is 
preferred that the repeat sequences flanking the transposable 
element should not be at the termini of the linear fragment but 
should rather include some DNA upstream and downstream from the 

25 region flanked by the repeat sequences. 

As was noted above, Tn5 transposition requires a pair of 
eighteen or nineteen base long termini. The wild type Tn5 
outside end (OE) sequence ( 5 1 - CTGACTCTTATACACAAGT - 3 1 ) (SEQ ID 
NO: 7) has been described. It has been discovered that a 

30 transposase-catalyzed in vitro transposition frequency at least 

as high as that of wild type OE is achieved if the termini in a 
construct include bases ATA at positions 10, 11, and 12, 
respectively, as well as the nucleotides in common between wild 
type OE and IE (e.g., at positions 1-3, 5-9, 13, 14, 16, and 

35 optionally 19) . The nucleotides at positions 4, 15, 17, and 18 

can correspond to the nucleotides found at those positions in 
either wild type OE or wild type IE. It is noted that the 
transposition frequency can be enhanced over that of wild type 
OE if the nucleotide at position 4 is a T. The importance of 

40 these particular bases to transposition frequency has not 

previously been identified. 
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It is noted that these changes are not intended to 
encompass every desirable modification to OE . As is described 
elsewhere herein, these attributes of acceptable termini 
modifications were identified by screening mutants having 
randomized differences between IE and OE termini. While the 
presence in the termini of certain nucleotides is shown herein 
to be advantageous, other desirable terminal sequences may yet 
be obtained by screening a larger array of degenerate mutants 
that include changes at positions other than those tested 
herein as well as mutants containing nucleotides not tested in 
the described screening. In addition, it is clear to one 
skilled in the art that if a different transposase is used, it 
may still be possible to select other variant termini, more 
compatible with that particular transposase. 

Among the mutants shown to be desirable and within the 
scope of the invention are two hyperactive mutant OE sequences 
that were identified in vivo. Although presented here as 
single stranded sequences, in fact, the wild type and mutant OE 
sequences include complementary second strands. The first 
hyperactive mutant, 5 ' -CTGTCTCTTATACACATCT- 3 1 (SEQ ID NO: 8), 
differs from the wild type OE sequence at positions 4, 17, and 
18, counting from the 5' end, but retains ATA at positions 10- 
12. The second, 5 1 -CTGTCTCTTATACAGATCT-3 1 (SEQ ID NO: 9), 
differs from the wild type OE sequence at positions 4, 15, 17, 
and 18, but also retains ATA at positions 10-12. These two 
hyperactive mutant OE sequences differ from one another only at 
position 15, where either G or C is present. OE-like activity 
(or higher activity) is observed in a mutant sequence when it 
contains ATA at positions 10, 11 and 12. It may be possible to 
reduce the length of the OE sequence from 19 to 18 nucleotide 
pairs with little or no effect. 

When one of the identified hyperactive mutant OE sequences 
flanks a substrate DNA, the in vivo transposition frequency of 
EK54/MA56 transposase is increased approximately 40-60 fold 
over the frequency that is observed when wild type OE termini 
flank the transposable DNA. The EK54/MA56 transposase is 
already known to have an in vivo transposition frequency 
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5 approximately an 8-10 fold higher than wild type transposase, 

using wild type OE termini. Tn5 transposase having the 
EK54/MA56 mutation is known to bind with greater avidity to OE 
and with lesser avidity to the Tn5 inside ends (IE) than wild 
type transposase . 

10 A suitable mutant terminus in a construct for use in the 

assays of the present invention is characterized biologically 
as yielding more papillae per colony in a comparable time, say 
68 hours, than is observed in colonies harboring wild type OE 
in a comparable plasmid. Wild type OE can yield about 100 

15 papillae per colony when measured 68 hours after plating in a 

papulation assay using EK54/MA56 transposase, as is described 
elsewhere herein. A preferred mutant would yield between about 
200 and 3000 papillae per colony, and a more preferred mutant 
between about 1000 and 3000 papillae per colony, when measured 

20 in the same assay and time frame. A most preferred mutant 

would yield between about 2000 and 3 000 papillae per colony 
when assayed under the same conditions. Papulation levels may 
be even greater than 3 000 per colony, although it is difficult 
to quantitate at such levels. 

25 Transposition frequency is also substantially enhanced in 

the in vitro transposition assay of the present invention when 
substrate DNA is flanked by a preferred mutant OE sequence and 
a most preferred mutant transposase (comprising EK54/MA56/LP372 
mutations) is used. Under those conditions, essentially all of 

30 the substrate DNA is converted into transposition products. 

The rate of in vitro transposition observed using the 
hyperactive termini is sufficiently high that, in the 
experience of the inventors, there is no need to select for 
transposition events. All colonies selected at random after 

3 5 transformation for further study have shown evidence of 

transposition events . 

This advance can represent a significant savings in time 
and laboratory effort. For example, it is particularly 
advantageous to be able to improve in vitro transposition 

4 0 frequency by modifying DNA rather than by modifying the 

transposase because as transposase activity increases in host 
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cells, there is an increased likelihood that cells containing 
the transposase are killed during growth as a result of 
aberrant DNA transpositions. In contrast, DNA of interest 
containing the modified OE termini can be grown in sources 
completely separate from the transposase, thus not putting the 
host cells at risk. 

Without intending to limit the scope of this aspect of 
this invention, it is apparent that the tested hyperactive 
termini do not bind with greater avidity to the transposase 
than do wild type OE termini. Thus, the higher transposition 
frequency brought about by the hyperactive termini is not due 
to enhanced binding to transposase. 

The transposable element between the termini can include 
any desired nucleotide sequence. The length of the 
transposable element between the termini should be at least 
about 50 base pairs, although smaller inserts may work. No 
upper limit to the insert size is known. However, it is known 
that a donor DNA portion of about 300 nucleotides in length can 
function well. By way of non- limiting examples, the 
transposable element can include a coding region that encodes a 
detectable or selectable protein, with or without associated 
regulatory elements such as promoter, terminator, or the like. 

If the element includes such a detectable or selectable 
coding region without a promoter, it will be possible to 
identify and map promoters in the target DNA that are uncovered 
by transposition of the coding region into a position 
downstream thereof, followed by analysis of the nucleic acid 
sequences upstream from the transposition site. 

Likewise, the element can include a primer binding site 
that can be transposed into the target DNA, to facilitate 
sequencing methods or other methods that rely upon the use of 
primers distributed throughout the target genetic material. 
Similarly, the method can be used to introduce a desired 
restriction enzyme site or polylinker, or a site suitable for 
another type of recombination, such as a cre-lox, into the 
target . 

The invention can be better understood upon consideration 
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of the following examples which are intended to be exemplary 
and not limiting on the invention. 



EXAMPLES 

To obtain the transposase modified at position 54, the 
first third of the coding region from an existing DNA clone 

10 that encodes the Tn5 transposase but not the inhibitor protein 

(MA56) was mutagenized according to known methods and DNA 
fragments containing the mutagenized portion were cloned to 
produce a library of plasmid clones containing a full length 
transposase gene. The clones making up the library were 

15 transformed into E. coli K-12 strain MDW320 bacteria which were 

plated and grown into colonies. Transposable elements provided 
in the bacteria on a separate plasmid contained a defective 
lacZ gene. The separate plasmid, pOXgen386, was described by 
Weinreich, M. et al., "A functional analysis of the Tn5 

20 Transposase: Identification of Domains Required for DNA Binding 

and Dimerization, " J ■ Mol. Biol. 241:166-177 (1993), 
incorporated herein by reference. Colonies having elevated 
transposase activity were selected by screening for blue (LacZ) 
spots in white colonies grown in the presence of X-gal. This 

25 papulation assay was described by Weinreich, et al . (1993), 

supra. The 5 ' -most third of Tn5 transposase genes from such 
colonies were sequenced to determine whether a mutation was 
responsible for the increase in transposase activity. It was 
determined that a mutation at position 54 to lysine (K) 

30 correlated well with the increase in transposase activity. 

Plasmid pRZ5412-EK54 contains lysine at position 54 as well as 
the described alanine at position 56. 

The fragment containing the LP3 72 mutation was isolated 
from pRZ4870 (Weinreich et al (1994)) using restriction enzymes 

35 Nhel and Bglll, and were ligated into Nhel-Bglll cut pRZ5412- 

EK54 to form a recombinant gene having the mutations at 
positions 54, 56 and 372, as described herein and shown in SEQ 
ID NO:l. The gene was tested and shown to have at least about 
a one hundred fold increase in activity relative to wild type 

40 Tn5 transposase. Each of the mutants at positions 54 and 3 72 
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alone had about a 10-fold increase in transposase activity. 

The modified transposase protein encoded by the triple- 
mutant recombinant gene was transferred into commercial T7 
expression vector pET-21D (commercially available from Novagen, 
Madison, WI) by inserting a BspHl/Sall fragment into Nhol/Xhol 
fragment of the pET-21D vector. This cloning puts the modified 
transposase gene under the control of the T7 promoter, rather 
than the natural promoter of the transposase gene. The gene 
product was overproduced in BL21 (DE3 ) pLysS bacterial host 
cells, which do not contain the binding site for the enzyme, by 
specific induction in a fermentation process after cell growth 
is complete. (Sfifi, Studier, F. W., et al . , "Use of T7 RNA 
Polymerase to Direct Expression of Cloned Genes," Methods 
Enzymol • 185:60-89 (1990)). The transposase was partially 
purified using the method of de la Cruz, modified by inducing 
overproduction at 33 or 37°C. After purification, the enzyme 
preparation was stored at -70°C in a storage buffer (10% 
glycerol, 0 . 7M NaCl , 20 mM Tris-HCl, pH 7 . 5 , 0.1% Triton-XlOO 
and 10 mM CHAPS) until use. This storage buffer is to be 
considered exemplary and not optimized. 

A single plasmid (pRZTLl, Fig. 1) was constructed to serve 
as both donor and target DNA in this Example. The complete 
sequence of the pRZTLl plasmid DNA is shown and described in 
SEQ ID NO:3. Plasmid pRZTLl contains two Tn5 19 base pair OE 
termini in inverted orientation to each other. Immediately 
adjacent to one OE sequence is a gene that would encode 
tetracycline resistance, but for the lack of an upstream 
promoter. However, the gene is expressed if the tetracycline 
resistance gene is placed downstream of a transcribed region 
(e.g., under the control of the promoter that promotes 
transcription of the chloramphenicol resistance gene also 
present on pRZTLl) . Thus, the test plasmid pRZTLl can be 
assayed in vivo after the in vitro reaction to confirm that 
transposition has occurred. The plasmid pRZTLl also includes 
an origin of replication in the transposable element, which 
ensures that all transposition products are plasmids that can 
replicate after introduction in host cells. 
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5 The following components were used in typical 20/^1 in 

vitro transposition reactions: 

Modified transposase: 2 fxl (approximately 0 . 1 /^g 
enzyme/^1) in storage buffer (10% glycerol, 0 . 7M NaCl , 20 
mM Tris-HCl, pH 7.5, 0.1% Triton-XlOO and 10 mM CHAPS) 

10 Donor/Target DNA: 18 fxl (approximately 1-2 fig) in 

reaction buffer (at final reaction concentrations of 0.1 M 
potassium glutamate, 25 mM Tris acetate, pH 7.5, 10 mM 
Mg 2+ -acetate, 50 fxg/ml BSA, 0 . 5 mM p-mercaptoethanol , 2 mM 
spermidine, 100 /ug/ml tRNA) . 

15 At 20°C, the transposase was combined with pRZTLl DNA for 

about 60 minutes. Then, the reaction volume was increased by 
adding two volumes of reaction buffer and the temperature was 
raised to 37°C for 2-3 hours whereupon cleavage and strand 
transfer occurred . 

20 Efficient in vitro transposition was shown to have 

occurred by in vivo and by in vitro methods. In vivo, many 
tetracycline-resistant colonies were observed after 
transferring the nucleic acid product of the reaction into DH5a 
bacterial cells. As noted, tetracycline resistance can only 

25 arise in this system if the transposable element is transposed 

downstream from an active promoter elsewhere on the plasmid. A 
typical transposition frequency was 0.1% of cells that received 
plasmid DNA, as determined by counting chloramphenicol 
resistant colonies. However, this number underestimates the 

30 total transposition event frequency because the detection 

system limits the target to 1/16 of the total. 

Moreover, in vitro electrophoretic (1% agarose) and DNA 
sequencing analyses of DNA isolated from purified colonies 
revealed products of true transposition events, including both 

35 intramolecular and intermolecular events. Results of typical 

reactions using circular plasmid pRZTLl substrates are shown in 
Lanes 4 & 5 . Lane 6 of Fig. 2 shows the results obtained using 
linear plasmid pRZTLl substrates. 
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The bands were revealed on 1% agarose gels by staining 
with SYBR Green (FMC BioProducts) and were scanned on a 
Fluorimager SI (Molecular Dynamics) . In Figure 2, lane 1 shows 
relaxed circle, linear, and closed circle versions of pRZTLl . 
Lanes 2 and 3 show intramolecular and intermolecular 
transposition products after in vitro transposition of pRZTLl , 
respectively. The products were purified from electroporated 
DH5a cells and were proven by size and sequence analysis to be 
genuine transposition products. Lanes 4 and 5 represent 
products of two independent in vitro reactions using a mixture 
of closed and relaxed circular test plasmid substrates. In 
lane 6, linear pRZTLl (Xhol-cut) was the reaction substrate. 
Lane 7 includes a BstEII digest of lambda DNA as a molecular 
weight standard. 

Fig. 3 reproduces lanes 4, 5, and 6 of Fig. 2 and shows an 
analysis of various products, based upon secondary restriction 
digest experiments and re-electroporation and DNA sequencing. 
The released donor DNA corresponds to the fragment of pRZTLl 
that contains the kanamycin resistance gene between the two OE 
sequences, or, in the case of the linear substrate, the OE-XhoI 
fragment. Intermolecular transposition products can be seen 
only as relaxed DNA circles. Intramolecular transposition 
products are seen as a ladder, which results from conversion of 
the initial superhelicity of the substrate into DNA knots. The 
reaction is efficient enough to achieve double transposition 
events that are a combination of inter- and intramolecular 
events . 

A preliminary investigation was made into the nature of 
the termini involved in a transposition reaction. Wild type 
Tn5 OE and IE sequences were compared and an effort was 
undertaken to randomize the nucleotides at each of the seven 
positions of difference. A population of oligonucleotides 
degenerate at each position of difference was created. Thus, 
individual oligonucleotides in the population randomly included 
either the nucleotide of the wild type OE or the wild type IE 
sequence. In this scheme, 2 7 (128) distinct oligonucleotides 
were synthesized using conventional tools. These 
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5 oligonucleotides having sequence characteristics of both OE and 

IE are referred to herein as OE/IE-like sequences. To avoid 
nomenclature issues that arise because the oligonucleotides are 
intermediate between OE and IE wild type sequences, the 
applicants herein note that selected oligonucleotide sequences 
10 are compared to the wild type OE rather than to wild type IE, 

unless specifically noted. It will be appreciated by one 
skilled in the art that if IE is selected as the reference 
point, the differences are identical but are identified 
differently. 

15 The following depicts the positions (x) that were varied 

in this mutant production scheme. WT OE is shown also at SEQ 
ID NO: 7, WT IE at SEQ ID NO: 10. 



5 1 - CTGACTCTTATACACAAGT- 3 ■ (WT OE) 

x xxx x xx (positions of difference) 

2 0 5 ! - CTGTCTCTTGATCAGATCT- 3 1 (WT IE) 



In addition to the degenerate OE/IE-like sequences, the 
37- base long synthetic oligonucleotides also included terminal 
SphI and Kpnl restriction enzyme recognition and cleavage sites 
for convenient cloning of the degenerate oligonucleotides into 
25 plasmid vectors. Thus, a library of randomized termini was 

created from population of 2 7 (128) types of degenerate 
oligonucleotides . 

Fig. 4 shows pRZ14 96, the complete sequence of which is 
presented as SEQ ID NO: 11. The following features are noted in 

3 0 the sequence: 

Feature Position 

WT OE 94-112 

LacZ coding 135-3137 

LacY coding 3199-4486 

35 LacA coding 4553-6295 

tet r coding 6669-9442 

transposase coding 10683-12111 (Comp. Strand) 

Cassette IE 12184-12225 

colEl sequence 127732-19182 

4 0 The IE cassette shown in Fig. 4 was excised using SphI and 

Kpnl and was replaced, using standard cleavage and ligation 
methods, by the synthetic termini cassettes comprising OE/IE- 
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like portions. Between the fixed wild type OE sequence and the 
OE/IE-like cloned sequence, plasmid pRZ1496 comprises a gene 
whose activity can be detected, namely LacZYA, as well as a 
selectable marker gene, tet r . The LacZ gene is defective in 
that it lacks suitable transcription and translation initiation 
signals. The LacZ gene is transcribed and translated only when 
it is transposed into a position downstream from such signals. 

The resulting clones were transformed using 
electroporation into dam", LacZ' bacterial cells, in this case 
JCMlOl/pOXgen cells which were grown at 37°C in LB medium under 
standard conditions. A dam - strain is preferred because dam 
methylation can inhibit IE utilization and wild type IE 
sequences include two dam methylation sites . A dam" strain 
eliminates dam methylation as a consideration in assessing 
transposition activity. The Tet r cells selected were LacZ"; 
transposition-activated Lac expression was readily detectable 
against a negative background. pOXgen is a non-essential F 
factor derivative that need not be provided in the host cells. 

In some experiments, the EK54/MA56 transposase was encoded 
directly by the transformed pRZ14 96 plasmid. In other 
experiments, the pRZ1496 plasmid was modified by deleting a 
unique Hindlll/EagI fragment (nucleotides 9112-12083) from the 
plasmid (see Fig. 4) to prevent transposase production. In the 
latter experiments, the host cells were co- transformed with the 
Hindlll/Eagl-deleted plasmid, termed pRZ5451 (Fig. 4) , and with 
an EK54/MA56 transposase -encoding chloramphenicol - resistant 
plasmid. In some experiments, a comparable plasmid encoding a 
wild type Tn5 transposase was used for comparison. 

Transposition frequency was assessed by a papillation 
assay that measured the number of blue spots (Lac producing 
cells or "papillae") in an otherwise white colony. Transformed 
cells were plated (approx. 50 colonies per plate) on Glucose 
minimal Miller medium (Miller, J., Experiments in Molecular 
Genetics , Cold Spring Harbor Laboratory, Cold Spring Harbor, NY 
(1972)) containing 0.3% casamino acids, 5 -bromo-4 -chloro-3 - 
indolyl-3-D-galactoside (40 /ng/ml) and phenyl- P-D-gaiactoside 
(0.05%). The medium contained tetracycline (15 fxg/ml) and, 
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5 where needed, chloramphenicol (20 //g/ml) . Colonies that 

survived the selection were evaluated for transposition 
frequency in vivo. Although colonies exhibiting superior 
papulation were readily apparent to the naked eye, the number 
of blue spots per colony were determined over a period of 

10 several days (approximately 90 hours post -plating) . 

To show that the high-papillation phenotype was conferred 
by the end mutations in the plasmids, colonies were re- streaked 
if they appeared to have papillation levels higher than was 
observed when wild type IE was included on the plasmid. 

15 Colonies picked from the streaked culture plates were 

themselves picked and cultured. DNA was obtained and purified 
from the cultured cells, using standard protocols, and was 
transformed again into "clean" JCMlOl/pOXgen cells. 
Papillation levels were again compared with wild type IE- 

20 containing plasmids in the above-noted assays, and consistent 

results were observed. 

To obtain DNA for sequencing of the inserted 
oligonucleotide, cultures were grown from white portions of 117 
hyperpapillating colonies, and DNA was prepared from each 

25 colony using standard DNA miniprep methods. The DNA sequence 

of the OE/IE-like portion of 117 clones was determined (42 from 
transformations using pRZ1496 as the cloning vehicle; 75 from 
transformations using pRZ5451 as the cloning vehicle) . Only 29 
unique mutants were observed. Many mutants were isolated 

30 multiple times. All mutants that showed the highest 

papillation frequencies contain OE-derived bases at positions 
10, 11, and 12. When the OE-like bases at these positions were 
maintained, it was impossible to measure the effect on 
transposition of other changes, since the papillation level was 

35 already extremely high. 

One thousand five hundred seventy five colonies were 
screened as described above. The likelihood that all 128 
possible mutant sequences were screened was greater than 95%. 
Thus, it is unlikely that other termini that contribute to a 

40 greater transformation frequency will be obtained using the 

tested transposase . 
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Tables I and II report the qualitative papillation level 
of mutant constructs carrying the indicated hybrid end 
sequences or the wild type OE or IE end sequences. In the 
tables, the sequence at each position of the terminus 
corresponds to wild type IE unless otherwise noted. The 
applicants intend that, while the sequences are presented in 
shorthand notation, one of ordinary skill can readily determine 
the complete 19 base pair sequence of every presented mutant, 
and this specification is to be read to include all such 
complete sequences. Table I includes data from trials where 
the EK54 transposase was provided in trans; Table II, from 
those trials where the EK54 transposase was provided in cis. 
Although a transposase provided in cis is more active in 
absolute terms than a transposase provided in trans, the cis or 
trans source of the transposase does not alter the relative in 
vivo transposition frequencies of the tested termini. 

Tables I and II show that every mutant that retains ATA at 
positions 10, 11, and 12, respectively, had an activity 
comparable to, or higher than, wild type OE , regardless of 
whether the wild type OE activity was medium (Table I, trans) 
or high (Table II, cis) . Moreover, whenever that three -base 
sequence in a mutant was not ATA, the mutant exhibited lower 
papillation activity than wild type OE. It was also noted that 
papillation is at least comparable to, and tends to be 
significantly higher than, wild type OE when position 4 is a T. 

Quantitative analysis of papillation levels was difficult, 
beyond the comparative levels shown (very low, low, medium, 
medium high, and high) . However, one skilled in the art can 
readily note the papillation level of OE and can recognize 
those colonies having comparable or higher levels. It is 
helpful to observe the papillae with magnification. 

The number of observed papillae increased over time, as is 
shown in Figs 5-7 which roughly quantitate the papillation 
observed in cells transformed separately with 9 clones 
containing either distinct synthetic termini cassettes or wild 
type OE or IE termini. In these 3 figures, each mutant is 
identified by its differences from the wild type IE sequence. 
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5 Note that, among the tested mutants, only mutant 10A/11T/12A 

had a higher transposition papillation level than wild type OE . 
That mutant, which would be called mutant 4/15/17/18 when OE is 
the reference sequence) was the only mutant of those shown in 
Figs. 5-7 that retained the nucleotides ATA at positions 10, 

10 11, and 12. Figs. 5 (y-axis: 0 - 1500 papillae) and 6 (y-axis: 

0 - 250 papillae) show papillation using various mutants plus 
IE and OE controls and the EK54/MA56 enzyme. Fig. 7 (y-axis : 0 
- 250 papillae) , shows papillation when the same mutant 
sequences were tested against the wild type (more properly, 

15 MA56) transposase. The 10A/11T/12A mutant (SEQ ID NO: 9) 

yielded significantly more papillae (approximately 3 000) in a 
shorter time (68 hours) with ED54/MA56 transposase than was 
observed even after 90 hours with the WT OE (approximately 
1500) . A single OE-like nucleotide at position 15 on an IE- 

20 like background also increased papillation frequency. 

In vivo transposition frequency was also quantitated in a 
tetracycline-resistance assay using two sequences having high 
levels of hyperpapillation. These sequences were 5'~ 
CTGTCTCTTATACACATCT - 3 * (SEQ ID NO: 8), which differs from the 

25 wild type OE sequence at positions 4, 17, and 18, counting from 

the 5' end, and 5 1 -CTGTCTCTTATACAGATCT-3 1 (SEQ ID NO: 9), which 
differs from the wild type OE at positions 4, 15, 17, and 18. 
These sequences are considered the preferred mutant termini in 
an assay using a transposase that contains EK54/MA56 or a 

30 transposase that contains MA56 . Each sequence was separately 

engineered into pRZTLl in place of the plasmid' s two wild type 
OE sequences. A PCR-amplif ied fragment containing the desired 
ends flanking the kanamycin resistance gene was readily cloned 
into the large Hindlll fragment of pRZTLl . The resulting 

3 5 plasmids are identical to pRZTLl except at the indicated 

termini. For comparison, pRZTLl and a derivative of pRZTLl 
containing two wild type IE sequences were also tested. In the 
assay, JCMlOl/pOXgen cells were co- transformed with a test 
plasmid (pRZTLl or derivative) and a high copy number amp r 

40 plasmid that encoded either the EK54/MA56 transposase or wild 

type (MA56) transposase. The host cells become tetracycline 
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resistant only when a transposition event brings the Tet r gene 
into downstream proximity with a suitable transcriptional 
promoter elsewhere on a plasmid or on the chromosome. The 
total number of cells that received the test plasmids was 
determined by counting chloramphenicol resistant, ampicillin 
resistant colonies. Transposition frequency was calculated by 
taking the ratio of tet r /cam r amp r colonies. Approximately 40 to 
60 fold increase over wild type OE in in vivo transposition was 
observed when using either of the mutant termini and EK54/MA56 
transposase. Of the two preferred mutant termini, the one 
containing mutations at three positions relative to the wild 
type OE sequence yielded a higher increase . 

As is shown in Fig. 8, which plots the tested plasmid 
against the transposition frequency (x 10~ 8 ) , little 
transposition was seen when the test plasmid included two IE 
termini. Somewhat higher transposition was observed when the 
test plasmid included two OE termini, particularly when the 
EK54/MA56 transposase was employed. In striking contrast, the 
combination of the EK54/MA56 transposase with either of the 
preferred selected ends (containing OE-like bases only at 
positions 10, 11, and 12, or positions 10, 11, 12, and 15) 
yielded a great increase in in vivo transposition over wild 
type OE termini. 

The preferred hyperactive mutant terminus having the most 
preferred synthetic terminus sequence 5 1 -CTGTCTCTTATACACATCT-3 ' 
(SEQ ID NO: 8) was provided in place of both WT OE termini in 
pRZTLl (Fig. 4) and was tested in the in vitro transposition 
assay of the present invention using the triple mutant 
transposase described herein. This mutant terminus was chosen 
for further in vitro analysis because its transposition 
frequency was higher than for the second preferred synthetic 
terminus and because it has no dam methylation sites, so dam 
methylation no longer affects transposition frequency. In 
contrast the 4/15/17/18 mutant does have a dam methylation 
site . 

In a preliminary experiment, CHAPS was eliminated from the 
reaction, but the pre -incubation step was used. The reaction 
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5 was pre-incubated for 1 hour at 20°C, then diluted two times, 

and then incubated for 3 hours at 37°C. About 0.5 fig of DNA 
and 0.4/zg of transposase was used. The transposition products 
were observed on a gel. With the mutant termini, very little 
of the initial DNA was observed. Numerous bands representing 

10 primary and secondary transposition reaction products were 

observed. The reaction mixtures were transformed into DHScx 
cells and were plated on chloramphenicol-, tetracycline-, or 
kanamycin-containing plates. 

Six hundred forty chloramphenicol-resistant colonies were 

15 observed. Although these could represent unreacted plasmid, 

all such colonies tested (n=12) were sensitive to kanamycin, 
which indicates a loss of donor backbone DNA. All twelve 
colonies also included plasmids of varied size; 9 of the 12 
were characterized as deletion- inversions , the remaining 3 were 

20 simple deletions. Seventy nine tetracycline-resistant 

colonies were observed, which indicated an activation of the 
tet r gene by transposition. 

Eleven kanamycin resistant colonies were observed. This 
indicated a low percentage of remaining plasmids carrying the 

2 5 donor backbone DNA. 

In a second, similar test, about 1 fig of plasmid DNA and 
0.2 fig transposase were used. In this test, the reaction was 
incubated without CHAPS at 37°C for 3 hours without pre- 
incubation or dilution. Some initial DNA was observed in the 

30 gel after the 3 hour reaction. After overnight incubation, 

only transposition products were observed. 

The 3 hour reaction products were transformed into DH5a 
cells and plated as described. About 50% of the 

chloramphenicol resistant colonies were sensitive to kanamycin 
35 and were presumably transposition products. 

The invention is not intended to be limited to the 

foregoing examples, but to encompass all such modifications and 

variations as come within the scope of the appended claims . 

It is envisioned that, in addition to the uses specifically 
4 0 noted herein, other applications will be apparent to the 

skilled molecular biologist. In particular, methods for 
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introducing desired mutations into prokaryotic or eukaryotic 
DNA are very desirable. For example, at present it is 
difficult to knock out a functional eukaryotic gene by 
homologous recombination with an inactive version of the gene 
that resides on a plasmid. The difficulty arises from the need 
to flank the gene on the plasmid with extensive upstream and 
downstream sequences. Using this system, however, an 
inactivating transposable element containing a selectable 
marker gene (e.g., neo) can be introduced in vitro into a 
plasmid that contains the gene that one desires to inactivate. 
After transposition, the products can be introduced into 
suitable host cells. Using standard selection means, one can 
recover only cell colonies that contain a plasmid having the 
transposable element. Such plasmids can be screened, for 
example by restriction analysis, to recover those that contain 
a disrupted gene. Such clones can then be introduced directly 
into eukaryotic cells for homologous recombination and 
selection using the same marker gene. 

Also, one can use the system to readily insert a PCR- 
amplified DNA fragment into a vector, thus avoiding traditional 
cloning steps entirely. This can be accomplished by (1) 
providing suitable a pair of PCR primers containing OE termini 
adjacent to the sequence-specific parts of the primers, (2) 
performing standard PCR amplification of a desired nucleic acid 
fragment, (3) performing the in vitro transposition reaction of 
the present invention using the double -stranded products of PCR 
amplification as the donor DNA. 
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SEQUENCE LISTING 



(1) GENERAL INFORMATION: 



(i) APPLICANT: Reznikoff, William S 
Gorysin, Igor Y 
Zhou , Hong 

10 (ii) TITLE OF INVENTION: System for In Vitro Transposition 

(iii) NUMBER OF SEQUENCES : 11 

(iv) CORRESPONDENCE ADDRESS: 

(A) ADDRESSEE: Quarles & Brady 

15 (B) STREET: 1 South Pinckney Street 

(C) CITY: Madison 

(D) STATE: WI 

(E) COUNTRY: USA 

(F) ZIP: 53703 

2 0 (v) COMPUTER READABLE FORM: 

(A) MEDIUM TYPE: Floppy disk 

(B) COMPUTER: IBM PC compatible 

(C) OPERATING SYSTEM: PC-DOS /MS -DOS 

(D) SOFTWARE: Patentln Release #1.0, Version #1.30 

2 5 (vi) CURRENT APPLICATION DATA: 

(A) APPLICATION NUMBER: 

(B) FILING DATE: 

(C) CLASSIFICATION: 

(viii) ATTORNEY /AGENT INFORMATION: 

3 0 (A) NAME: Berson, Bennett J 

(B) REGISTRATION NUMBER: 3 70 94 

(C) REFERENCE/DOCKET NUMBER: 960296.94142 

(ix) TELECOMMUNICATION INFORMATION: 
(A) TELEPHONE: 608/251-5000 
35 (B) TELEFAX: 608-251-9166 

(2) INFORMATION FOR SEQ ID NO : 1 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1534 base pairs 

(B) TYPE: nucleic acid 

4 0 (C) STRANDEDNESS : double 

<D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: other nucleic acid 

(A) DESCRIPTION : /desc = "Gene encoding modified Tn5 

transposase enzyme " 

4 5 (ix) FEATURE: 

(A) NAME /KEY: CDS 

<B) LOCATION: 93.. 1523 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 1 : 

CTGACTCTTA TACACAAGTA GCGTCCTGAA CGGAACCTTT CCCGTTTTCC AGGATCTGAT 6 0 

5 0 CTTCCATGTG ACCTCCTAAC ATGGTAACGT TC ATG ATA ACT TCT GCT CTT CAT 113 

Met lie Thr Ser Ala Leu His 
1 5 
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CGT GCG GCC GAC TGG GCT AAA TCT GTG TTC TCT TCG GCG GCG CTG GGT 161 
Arg Ala Ala Asp Trp Ala Lys Ser Val Phe Ser Ser Ala Ala Leu Gly 
10 * 15 2 0 

GAT CCT CGC CGT ACT GCC CGC TTG GTT AAC GTC GCC GCC CAA TTG GCA 2 09 

Asp Pro Arg Arg Thr Ala Arg Leu Val Asn Val Ala Ala Gin Leu Ala 
25 ~ ~ 30 35 

AAA TAT TCT GGT AAA TCA ATA ACC ATC TCA TCA GAG GGT AGT AAA GCC 25 7 

Lys Tyr Ser Gly Lys Ser lie Thr lie Ser Ser Glu Gly Ser Lys Ala 
40 ' 45 50 55 

GCC CAG GAA GGC GCT TAC CGA TTT ATC CGC AAT CCC AAC GTT TCT GCC 3 05 

Ala Gin Glu Gly Ala Tyr Arg Phe lie Arg Asn Pro Asn Val Ser Ala 
60 65 70 

GAG GCG ATC AGA AAG GCT GGC GCC ATG CAA ACA GTC AAG TTG GCT CAG 3 53 

Glu Ala lie Arg Lys Ala Gly Ala Met Gin Thr Val Lys Leu Ala Gin 
75 80 85 

GAG TTT CCC GAA CTG CTG GCC ATT GAG GAC ACC ACC TCT TTG AGT TAT 4 01 

Glu Phe Pro Glu Leu Leu Ala He Glu Asp Thr Thr Ser Leu Ser Tyr 
90 95 100 

CGC CAC CAG GTC GCC GAA GAG CTT GGC AAG CTG GGC TCT ATT CAG GAT 44 9 

Arg His Gin Val Ala Glu Glu Leu Gly Lys Leu Gly Ser He Gin Asp 
105 110 115 

AAA TCC CGC GGA TGG TGG GTT CAC TCC GTT CTC TTG CTC GAG GCC ACC 4 97 

Lys Ser Arg Gly Trp Trp Val His Ser Val Leu Leu Leu Glu Ala Thr 
120 " 125 130 135 

ACA TTC CGC ACC GTA GGA TTA CTG CAT CAG GAG TGG TGG ATG CGC CCG 54 5 

Thr Phe Arg Thr Val Gly Leu Leu His Gin Glu Trp Trp Met Arg Pro 
140 ' 145 150 

GAT GAC CCT GCC GAT GCG GAT GAA AAG GAG AGT GGC AAA TGG CTG GCA 5 93 

Asp Asp Pro Ala Asp Ala Asp Glu Lys Glu Ser Gly Lys Trp Leu Ala 
155 " 160 165 

GCG GCC GCA ACT AGC CGG TTA CGC ATG GGC AGC ATG ATG AGC AAC GTG 641 
Ala Ala Ala Thr Ser Arg Leu Arg Met Gly Ser Met Met Ser Asn Val 
170 175 180 

ATT GCG GTC TGT GAC CGC GAA GCC GAT ATT CAT GCT TAT CTG CAG GAC 68 9 

He Ala Val Cys Asp Arg Glu Ala Asp He His Ala Tyr Leu Gin Asp 
185 190 195 

AGG CTG GCG CAT AAC GAG CGC TTC GTG GTG CGC TCC AAG CAC CCA CGC 73 7 

Arg Leu Ala His Asn Glu Arg Phe Val Val Arg Ser Lys His Pro Arg 
200 205 210 215 

AAG GAC GTA GAG TCT GGG TTG TAT CTG ATC GAC CAT CTG AAG AAC CAA 78 5 

Lys Asp Val Glu Ser Gly Leu Tyr Leu He Asp His Leu Lys Asn Gin 
220 225 230 

CCG GAG TTG GGT GGC TAT CAG ATC AGC ATT CCG CAA AAG GGC GTG GTG 8 33 

Pro Glu Leu Gly Gly Tyr Gin He Ser He Pro Gin Lys Gly Val Val 
235 240 245 

GAT AAA CGC GGT AAA CGT AAA AAT CGA CCA GCC CGC AAG GCG AGC TTG 881 
Asp Lys Arg Gly Lys Arg Lys Asn Arg Pro Ala Arg Lys Ala Ser Leu 
250 255 260 

AGC CTG CGC AGT GGG CGC ATC ACG CTA AAA CAG GGG AAT ATC ACG CTC 92 9 

Ser Leu Arg Ser Gly Arg He Thr Leu Lys Gin Gly Asn He Thr Leu 
265 ~ 270 275 
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5 AAC GCG GTG CTG GCC GAG GAG ATT AAC CCG CCC AAG GGT GAG ACC CCG 97 7 

Asn Ala Val Leu Ala Glu Glu lie Asn Pro Pro Lys Gly Glu Thr Pro 
280 285 290 295 

TTG AAA TGG TTG TTG CTG ACC GGC GAA CCG GTC GAG TCG CTA GCC CAA 102 5 

Leu Lys Trp Leu Leu Leu Thr Gly Glu Pro Val Glu Ser Leu Ala Gin 
10 300 305 310 

GCC TTG CGC GTC ATC GAC ATT TAT ACC CAT CGC TGG CGG ATC GAG GAG 10 73 

Ala Leu Arg Val lie Asp lie Tyr Thr His Arg Trp Arg lie Glu Glu 
315 320 325 

TTC CAT AAG GCA TGG AAA ACC GGA GCA GGA GCC GAG AGG CAA CGC ATG 1121 
15 Phe His Lys Ala Trp Lys Thr Gly Ala Gly Ala Glu Arg Gin Arg Met 

330 335 340 

GAG GAG CCG GAT AAT CTG GAG CGG ATG GTC TCG ATC CTC TCG TTT GTT 116 9 

Glu Glu Pro Asp Asn Leu Glu Arg Met Val Ser lie Leu Ser Phe Val 
345 350 355 

2 0 GCG GTC AGG CTG TTA CAG CTC AGA GAA AGC TTC ACG CCG CCG CAA GCA 1217 

Ala Val Arg Leu Leu Gin Leu Arg Glu Ser Phe Thr Pro Pro Gin Ala 
360 365 370 375 

CTC AGG GCG CAA GGG CTG CTA AAG GAA GCG GAA CAC GTA GAA AGC CAG 12 6 5 

Leu Arg Ala Gin Gly Leu Leu Lys Glu Ala Glu His Val Glu Ser Gin 
25 380 385 390 

TCC GCA GAA ACG GTG CTG ACC CCG GAT GAA TGT CAG CTA CTG GGC TAT 1313 
Ser Ala Glu Thr Val Leu Thr Pro Asp Glu Cys Gin Leu Leu Gly Tyr 
395 400 405 

CTG GAC AAG GGA AAA CGC AAG CGC AAA GAG AAA GCA GGT AGC TTG CAG 13 61 

3 0 Leu Asp Lys Gly Lys Arg Lys Arg Lys Glu Lys Ala Gly Ser Leu Gin 

410 415 420 

TGG GCT TAC ATG GCG ATA GCT AGA CTG GGC GGT TTT ATG GAC AGC AAG 14 09 

Trp Ala Tyr Met Ala lie Ala Arg Leu Gly Gly Phe Met Asp Ser Lys 
425 430 435 

3 5 CGA ACC GGA ATT GCC AGC TGG GGC GCC CTC TGG GAA GGT TGG GAA GCC 1457 

Arg Thr Gly lie Ala Ser Trp Gly Ala Leu Trp Glu Gly Trp Glu Ala 
440 445 450 455 

CTG CAA AGT AAA CTG GAT GGC TTT CTT GCC GCC AAG GAT CTG ATG GCG 15 05 

Leu Gin Ser Lys Leu Asp Gly Phe Leu Ala Ala Lys Asp Leu Met Ala 
40 460 465 470 

CAG GGG ATC AAG ATC TGA T C AAG AG AC A G 1534 
Gin Gly He Lys He * 
475 

(2) INFORMATION FOR SEQ ID NO : 2 : 

45 (i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 4 77 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

50 (xi) SEQUENCE DESCRIPTION: SEQ ID NO : 2 : 

Met He Thr Ser Ala Leu His Arg Ala Ala Asp Trp Ala Lys Ser Val 
15 10 15 
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Phe Ser Ser Ala Ala Leu Gly Asp Pro Arg Arg Thr Ala Arg Leu Val 
20 25 30 

Asn Val Ala Ala Gin Leu Ala Lys Tyr Ser Gly Lys Ser He Thr He 
35 40 45 

Ser Ser Glu Gly Ser Lys Ala Ala Gin Glu Gly Ala Tyr Arg Phe He 
50 55 60 

Arg Asn Pro Asn Val Ser Ala Glu Ala He Arg Lys Ala Gly Ala Met 
€5 70 75 80 

Gin Thr Val Lys Leu Ala Gin Glu Phe Pro Glu Leu Leu Ala He Glu 
85 90 95 

Asp Thr Thr Ser Leu Ser Tyr Arg His Gin Val Ala Glu Glu Leu Gly 
100 105 HO 

Lys Leu Gly Ser He Gin Asp Lys Ser Arg Gly Trp Trp Val His Ser 
115 120 125 

Val Leu Leu Leu Glu Ala Thr Thr Phe Arg Thr Val Gly Leu Leu His 
130 135 140 

Gin Glu Trp Trp Met Arg Pro Asp Asp Pro Ala Asp Ala Asp Glu Lys 
145 " 150 155 160 

Glu Ser Gly Lys Trp Leu Ala Ala Ala Ala Thr Ser Arg Leu Arg Met 
165 170 175 

Gly Ser Met Met Ser Asn Val He Ala Val Cys Asp Arg Glu Ala Asp 
180 185 190 

He His Ala Tyr Leu Gin Asp Arg Leu Ala His Asn Glu Arg Phe Val 
195 200 205 

Val Arg Ser Lys His Pro Arg Lys Asp Val Glu Ser Gly Leu Tyr Leu 
210 215 220 

He Asp His Leu Lys Asn Gin Pro Glu Leu Gly Gly Tyr Gin He Ser 
225 230 235 240 

He Pro Gin Lys Gly Val Val Asp Lys Arg Gly Lys Arg Lys Asn Arg 
245 250 255 

Pro Ala Arg Lys Ala Ser Leu Ser Leu Arg Ser Gly Arg He Thr Leu 
260 265 270 

Lys Gin Gly Asn He Thr Leu Asn Ala Val Leu Ala Glu Glu He Asn 
275 280 285 

Pro Pro Lys Gly Glu Thr Pro Leu Lys Trp Leu Leu Leu Thr Gly Glu 
290 295 300 

Pro Val Glu Ser Leu Ala Gin Ala Leu Arg Val He Asp He Tyr Thr 
305 310 315 320 

His Arg Trp Arg He Glu Glu Phe His Lys Ala Trp Lys Thr Gly Ala 
325 330 335 

Gly Ala Glu Arg Gin Arg Met Glu Glu Pro Asp Asn Leu Glu Arg Met 
340 345 350 

Val Ser He Leu Ser Phe Val Ala Val Arg Leu Leu Gin Leu Arg Glu 
355 360 365 
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5 Ser Phe Thr Pro Pro Gin Ala Leu Arg Ala Gin Gly Leu Leu Lys Glu 

370 375 380 

Ala Glu His Val Glu Ser Gin Ser Ala Glu Thr Val Leu Thr Pro Asp 
385 390 395 400 

Glu Cys Gin Leu Leu Gly Tyr Leu Asp Lys Gly Lys Arg Lys Arg Lys 
10 405 410 415 

Glu Lys Ala Gly Ser Leu Gin Trp Ala Tyr Met Ala lie Ala Arg Leu 
420 425 430 

Gly Gly Phe Met Asp Ser Lys Arg Thr Gly lie Ala Ser Trp Gly Ala 
435 440 445 

15 Leu Trp Glu Gly Trp Glu Ala Leu Gin Ser Lys Leu Asp Gly Phe Leu 

450 * 455 460 

Ala Ala Lys Asp Leu Met Ala Gin Gly lie Lys lie * 
465 470 475 

(2) INFORMATION FOR SEQ ID NO : 3 : 

2 0 (i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 5838 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: circular 

25 (ii) MOLECULE TYPE: other nucleic acid 

(A) DESCRIPTION: /desc = "Plasmid DNA" 

(vii) IMMEDIATE SOURCE: 

(B) CLONE: pRZTLl 

(ix) FEATURE : 

3 0 (A) NAME / KEY : insert ion_seq 

(B) LOCATION: 1 . . 19 

(ix) FEATURE: 

(A) NAME / KEY : CDS 

(B) LOCATION: 77.. 1267 

3 5 (D) OTHER INFORMATION: /function= "tetracycline resistance" 

(ix) FEATURE: 

(A) NAME /KEY : CDS 

(B) LOCATION: complement (2301.. 2960) 

(D) OTHER INFORMATION: /function^ "chloramphenicol resistance" 

4 0 (ix) FEATURE: 

(A) NAME/KEY: insert ion_seq 

(B) LOCATION: 4564.. 4582 

(ix) FEATURE: 

(A) NAME /KEY : CDS 
45 (B) LOCATION: 4715.. 5530 

(D) OTHER INFORMATION: /function= "kanamycin resistance" 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 3 : 

CTGACTCTTA TACACAAGTA AG C TTTAATG CGGTAGTTTA TCACAGTTAA ATTGCTAACG 6 0 

CAGTCAGGCA CCGTGT ATG AAA TCT AAC AAT GCG CTC ATC GTC ATC CTC 10 9 

5 0 Met Lys Ser Asn Asn Ala Leu lie Val lie Leu 

480 485 
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5 GGC ACC GTC ACC CTG GAT GCT GTA GGC ATA GGC TTG GTT ATG CCG GTA 157 

Gly Thr Val Thr Leu Asp Ala Val Gly lie Gly Leu Val Met Pro Val 
490 495 500 

CTG CCG GGC CTC TTG CGG GAT ATC GTC CAT TCC GAC AGC ATC GCC AGT 2 05 

Leu Pro Gly Leu Leu Arg Asp lie Val His Ser Asp Ser lie Ala Ser 
10 505 J 510 515 520 

CAC TAT GGC GTG CTG CTA GCG CTA TAT GCG TTG ATG CAA TTT CTA TGC 253 
His Tyr Gly Val Leu Leu Ala Leu Tyr Ala Leu Met Gin Phe Leu Cys 
525 530 535 

GCA CCC GTT CTC GGA GCA CTG TCC GAC CGC TTT GGC CGC CGC CCA GTC 3 01 

15 Ala Pro Val Leu Gly Ala Leu Ser Asp Arg Phe Gly Arg Arg Pro Val 

540 545 550 

CTG CTC GCT TCG CTA CTT GGA GCC ACT ATC GAC TAC GCG ATC ATG GCG 34 9 

Leu Leu Ala Ser Leu Leu Gly Ala Thr lie Asp Tyr Ala lie Met Ala 
555 560 565 

2 0 ACC ACA CCC GTC CTG TGG ATC CTC TAC GCC GGA CGC ATC GTG GCC GGC 3 97 

Thr Thr Pro Val Leu Trp lie Leu Tyr Ala Gly Arg lie Val Ala Gly 
570 575 580 

ATC ACC GGC GCC ACA GGT GCG GTT GCT GGC GCC TAT ATC GCC GAC ATC 44 5 

He Thr Gly Ala Thr Gly Ala Val Ala Gly Ala Tyr He Ala Asp He 
25 585 590 595 600 

ACC GAT GGG GAA GAT CGG GCT CGC CAC TTC GGG CTC ATG AGC GCT TGT 4 93 

Thr Asp Gly Glu Asp Arg Ala Arg His Phe Gly Leu Met Ser Ala Cys 
605 610 615 

TTC GGC GTG GGT ATG GTG GCA GGC CCC GTG GCC GGG GGA CTG TTG GGC 541 

3 0 Phe Gly Val Gly Met Val Ala Gly Pro Val Ala Gly Gly Leu Leu Gly 

620 625 630 

GCC ATC TCC TTG CAT GCA CCA TTC CTT GCG GCG GCG GTG CTC AAC GGC 589 
Ala He Ser Leu His Ala Pro Phe Leu Ala Ala Ala Val Leu Asn Gly 
635 640 645 

35 CTC AAC CTA CTA CTG GGC TGC TTC CTA ATG CAG GAG TCG CAT AAG GGA 637 

Leu Asn Leu Leu Leu Gly Cys Phe Leu Met Gin Glu Ser His Lys Gly 
650 655 660 

GAG CGT CGA CCG ATG CCC TTG AGA GCC TTC AAC CCA GTC AGC TCC TTC 68 5 

Glu Arg Arg Pro Met Pro Leu Arg Ala Phe Asn Pro Val Ser Ser Phe 
40 665 ~ 670 675 680 

CGG TGG GCG CGG GGC ATG ACT ATC GTC GCC GCA CTT ATG ACT GTC TTC 73 3 

Arg Trp Ala Arg Gly Met Thr He Val Ala Ala Leu Met Thr Val Phe 
685 690 695 

TTT ATC ATG CAA CTC GTA GGA CAG GTG CCG GCA GCG CTC TGG GTC ATT 781 

4 5 Phe lie Met Gin Leu Val Gly Gin Val Pro Ala Ala Leu Trp Val He 

700 705 710 

TTC GGC GAG GAC CGC TTT CGC TGG AGC GCG ACG ATG ATC GGC CTG TCG 82 9 

Phe Gly Glu Asp Arg Phe Arg Trp Ser Ala Thr Met He Gly Leu Ser 
715 720 725 

50 CTT GCG GTA TTC GGA ATC TTG CAC GCC CTC GCT CAA GCC TTC GTC ACT 8 77 

Leu Ala Val Phe Gly He Leu His Ala Leu Ala Gin Ala Phe Val Thr 
730 735 740 

GGT CCC GCC ACC AAA CGT TTC GGC GAG AAG CAG GCC ATT ATC GCC GGC 92 5 

Gly Pro Ala Thr Lys Arg Phe Gly Glu Lys Gin Ala He He Ala Gly 
55 745 750 755 760 
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5 ATG GCG GCC GAC GCG CTG GGC TAC GTC TTG CTG GCG TTC GCG ACG CGA 9 73 

Met Ala Ala Asp Ala Leu Gly Tyr Val Leu Leu Ala Phe Ala Thr Arg 
765 " 770 775 

GGC TGG ATG GCC TTC CCC ATT ATG ATT CTT CTC GCT TCC GGC GGC ATC 1021 
Gly Trp Met Ala Phe Pro lie Met lie Leu Leu Ala Ser Gly Gly lie 
10 780 785 790 

GGG ATG CCC GCG TTG CAG GCC ATG CTG TCC AGG CAG GTA GAT GAC GAC 106 9 

Gly Met Pro Ala Leu Gin Ala Met Leu Ser Arg Gin Val Asp Asp Asp 
795 800 805 

CAT CAG GGA CAG CTT CAA GGA TCG CTC GCG GCT CTT ACC AGC CTA ACT 1117 
15 His Gin Gly Gin Leu Gin Gly Ser Leu Ala Ala Leu Thr Ser Leu Thr 

810 815 820 

TCG ATC ACT GGA CCG CTG ATC GTC ACG GCG ATT TAT GCC GCC TCG GCG 1165 
Ser lie Thr Gly Pro Leu lie Val Thr Ala lie Tyr Ala Ala Ser Ala 
825 830 835 840 

2 0 AGC ACA TGG AAC GGG TTG GCA TGG ATT GTA GGC GCC GCC CTA TAC CTT 1213 

Ser Thr Trp Asn Gly Leu Ala Trp lie Val Gly Ala Ala Leu Tyr Leu 
845 850 " 855 

GTC TGC CTC CCC GCG TTG CGT CGC GGT GCA TGG AGC CGG GCC ACC TCG 1261 
Val Cys Leu Pro Ala Leu Arg Arg Gly Ala Trp Ser Arg Ala Thr Ser 
25 860 865 870 

ACC TGA ATGGAAGCCG GCGGCACCTC GCTAACGGAT TCACCACTCC AAGAATTGGA 1317 
Thr * 



30 



35 



40 



45 



GCCAATCAAT 


TCTTGCGGAG 


AACTGTGAAT 


GCGCAAACCA 


ACCCTTGGCA 


GAACATATCC 


1377 


ATCGCGTCCG 


CCATCTCCAG 


CAGCCGCACG 


CGGCGCATCT 


CGGGCAGCGT 


TGGGTCCTGG 


1437 


CCACGGGTGC 


GCATGATCGT 


GCTCCTGTCG 


TTGAGG AC CC 


GGCTAGGCTG 


GCGGGGTTGC 


1497 


CTTACTGGTT 


AGCAGAATGA 


ATCACCGATA 


CGCGAGCGAA 


CGTGAAGCGA 


CTGCTGCTGC 


1557 


AAAACGTCTG 


CGACCTGAGC 


AACAACATGA 


ATGGTCTTCG 


GTTTCCGTGT 


TTCGTAAAGT 


1617 


CTGGAAACGC 


GGAAGTCCCC 


TACGTGCTGC 


TGAAGTTGCC 


CGCAACAGAG 


AG TGG AAC C A 


1677 


ACCGGTGATA 


CCACGATACT 


ATGACTGAGA 


GTCAACGCCA 


TGAGCGGCCT 


CATTTCTTAT 


1737 


TCTGAGTTAC 


AACAGTCCGC 


ACCGCTGTCC 


GGTAGCTCCT 


TCCGGTGGGC 


GCGGGGC ATG 


1797 


ACTATCGTCG 


CCGCACTTAT 


GACTGTCTTC 


TTTATCATGC 


AACTCGTAGG 


ACAGGTGCCG 


1857 


GCAGCGCCCA 


ACAGTCCCCC 


GGCCACGGGG 


CCTGCCACCA 


TACCCACGCC 


GAAACAAGCG 


1917 


CCCTGCACCA 


TTATGTTCCG 


GATCTGCATC 


GCAGGATGCT 


GCTGGCTACC 


CTGTGGAACA 


1977 


CCTACATCTG 


TATTAACGAA 


GCGCTAACCG 


TTTTTATCAG 


GCTCTGGGAG 


GCAGAATAAA 


2037 


TGATCATATC 


GTCAATTATT 


ACCTCCACGG 


GGAGAGCCTG 


AGCAAACTGG 


CCTCAGGCAT 


2097 


TTGAGAAGCA 


CACGGTCACA 


CTGCTTCCGG 


TAGT CAATAA 


ACCGGTAAAC 


CAGCAATAGA 


2157 


CATAAGCGGC 


TATTTAACGA 


CCCTGCCCTG 


AACCGACGAC 


CGGGTCGAAT 


TTGCTTTCGA 


2217 


ATTTCTGCCA 


TTCATCCGCT 


TATTATCAAT 


TATTCAGGCG 


TAGCACCAGG 


CGTTTAAGGG 


2277 


CACCAATAAC 


TGC CTTAAAA 


AAATTACGCC 


CCGCCCTGCC 


ACTCATCGCA 


GTACTGTTGT 


2337 


AATTCATTAA 


GCATTCTGCC 


GACATGGAAG 


CCATCACAGA 


CGGCATGATG 


AACCTGAATC 


2397 
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5 GCCAGCGGCA TCAGCACCTT GTCGCCTTGC GTATAATATT TGCCCATGGT GAAAACGGGG 24 5 7 

GCGAAGAAGT TGTCCATATT GGCCACGTTT AAATCAAAAC TGGTGAAACT CACCCAGGGA 2 517 

TTGG CTG AG A CGAAAAACAT ATTCTCAATA AACCCTTTAG GGAAATAGGC CAGGTTTTCA 2577 

CCGTAACACG CCACATCTTG CGAATATATG TGTAGAAACT GCCGGAAATC GTCGTGGTAT 263 7 

TCACTCCAGA GCGATGAAAA CGTTTCAGTT TGCTCATGGA AAACGGTGTA ACAAGGGTGA 26 97 

10 ACACTATCCC ATATCACCAG CTCACCGTCT TTCATTGCCA TACGGAATTC CGGATGAGCA 2757 

TTCATCAGGC GGG CAAGAAT GTGAATAAAG GCCGGATAAA ACTTGTGCTT ATTTTTCTTT 2817 

ACGGTCTTTA AAAAGGCCGT AATATCCAGC TGAACGGTCT GGTTATAGGT ACATTGAGCA 2 8 77 

ACTGACTGAA ATGCCTCAAA ATG TTCTTT A CGATGCCATT GGGATATATC AACGGTGGTA 2 93 7 

TATCCAGTGA TTTTTTTCTC CATTTTAGCT TCCTTAGCTC CTGAAAATCT CGATAACTCA 2 9 97 

15 AAAAATACGC CCGGTAGTGA TCTTATTT C A TTATGGTGAA AGTTGGAACC TCTTACGTGC 3 0 57 

CGATCAACGT CTCATTTTCG CCAAAAGTTG GCCCAGGGCT TCCCGGTATC AACAGGGACA 3117 

CCAGGATTTA TTTATTCTGC GAAGTGATCT TCCGTCACAG GTATTTATTC GGCGCAAAGT 317 7 

GCGTCGGGTG ATGCTGCCAA CTTACTGATT TAGTGTATGA TGGTGTTTTT GAGGTGCTCC 3237 

AGTGGCTTCT GTTTCTATCA GCTGTCCCTC CTGTTCAGCT ACTGACGGGG TGGTGCGTAA 3 2 97 

2 0 CGGCAAAAGC ACCGCCGGAC ATCAGCGCTA GCGGAGTGTA TACTGGCTTA CTATGTTGGC 3 357 

ACTGATGAGG GTGTCAGTGA AGTGCTTCAT GTGGCAGGAG AAAAAAGGCT GCACCGGTGC 3417 

GTCAGCAGAA TATGTGATAC AGGATATATT CCGCTTCCTC GCTCACTGAC TCGCTACGCT 3477 

CGGTCGTTCG ACTGCGGCGA GCGGAAATGG CTTACGAACG GGGCGGAGAT TTCCTGGAAG 3 53 7 

ATGCCAGGAA GATACTTAAC AGGGAAGTGA GAGGGCCGCG GCAAAGCCGT TTTTCCATAG 3 597 

2 5 GCTCCGCCCC CCTGACAAGC ATCACGAAAT CTGACGCTCA AATCAGTGGT GGCGAAACCC 3657 

GACAGGACTA T AAAG AT AC C AGGCGTTTCC CCTGGCGGCT CCCTCGTGCG CTCTCCTGTT 3 717 

CCTGCCTTTC GGTTTACCGG TGTCATTCCG CTGTTATGGC CGCGTTTGTC TCATTCCACG 3 777 

CCTGACACTC AGTTCCGGGT AGGCAGTTCG CTCCAAGCTG GACTGTATGC ACGAACCCCC 383 7 

CGTTCAGTCC GACCGCTGCG C CTTATC CGG TAACTATCGT CTTGAGTCCA ACCCGGAAAG 3 8 97 

3 0 ACATGCAAAA GCACCACTGG CAGCAGCCAC TGGTAATTGA TTTAGAGGAG TTAGTCTTGA 3 957 

AGTCATGCGC CGGTTAAGGC TAAACTGAAA GGACAAGTTT TGGTGACTGC GCTCCTCCAA 4 017 

GCCAGTTACC TCGGTTCAAA GAGTTGGTAG CTCAGAGAAC CTTCGAAAAA CCGCCCTGCA 4 077 

AGGCGGTTTT TTCGTTTTCA GAGCAAGAGA TTACGCGCAG AC C AAAACGA TCTCAAGAAG 4137 

ATCATCT TAT TAATCAGATA AAATATTTCT AGAGGTGAAC CATCACCCTA ATCAAGTTTT 4197 

3 5 TTGGGGTCGA GGTGCCGTAA AGCACTAAAT CGGAACCCTA AAGGGATGCC CCGATTTAGA 42 57 

GCTTGACGGG GAAAGCCGGC GAACGTGGCG AGAAAGGAAG GGAAG AAAG C GAAAGGAGCG 4 317 

GGCGCTAGGG CGCTGGCAAG TGTAGCGGTC ACGCTGCGCG TAACCACCAC ACCCGCCGCG 4377 

CTTAATGCGC CGCTACAGCG CCATTCGCCA TTCAGGCTGC GCAACTGTTG GGAAGGGCGA 44 3 7 
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5 TCGGTGCGGG CCTCTTCGCT ATTACGCCAG CTGGCGAAAG GGGGATGTGC TGCAAGGCGA 44 97 

TTAAGTTGGG TAACGCCAGG GTTTTCCCAG TCACGACGTT GTAAAACGAC GGCCAGTGCC 4557 

AAGCTTACTT GTGTATAAGA GTCAGTCGAC CTGCAGGGGG GGGGGGGAAA GCCACGTTGT 4 617 

GTCTCAAAAT CTCTGATGTT ACATTGCACA AGATAAAAAT ATATCATCAT GAACAATAAA 4677 

ACTGTCTGCT TACATAAACA GTAATACAAG GGGTGTT ATG AGC CAT ATT CAA CGG 4732 
10 Met Ser His lie Gin Arg 

1 5 

GAA ACG TCT TGC TCG AGG CCG CGA TTA AAT TCC AAC ATG GAT GCT GAT 4780 
Glu Thr Ser Cys Ser Arg Pro Arg Leu Asn Ser Asn Met Asp Ala Asp 
10 15 20 

15 TTA TAT GGG TAT AAA TGG GCT CGC GAT AAT GTC GGG CAA TCA GGT GCG 4 828 

Leu Tyr Gly Tyr Lys Trp Ala Arg Asp Asn Val Gly Gin Ser Gly Ala 
25 30 35 

ACA ATC TAT CGA TTG TAT GGG AAG CCC GAT GCG CCA GAG TTG TTT CTG 4 8 76 

Thr lie Tyr Arg Leu Tyr Gly Lys Pro Asp Ala Pro Glu Leu Phe Leu 

2 0 4 0 45 50 

AAA CAT GGC AAA GGT AGC GTT GCC AAT GAT GTT ACA GAT GAG ATG GTC 4 924 

Lys His Gly Lys Gly Ser Val Ala Asn Asp Val Thr Asp Glu Met Val 
55 60 65 70 

AGA CTA AAC TGG CTG ACG GAA TTT ATG CCT CTT CCG ACC ATC AAG CAT 4 972 

25 Arg Leu Asn Trp Leu Thr Glu Phe Met Pro Leu Pro Thr lie Lys His 

75 80 85 

TTT ATC CGT ACT CCT GAT GAT GCA TGG TTA CTC ACC ACT GCG ATC CCC 5 02 0 

Phe lie Arg Thr Pro Asp Asp Ala Trp Leu Leu Thr Thr Ala lie Pro 
90 95 100 

3 0 GGG AAA ACA GCA TTC CAG GTA TTA GAA GAA TAT CCT GAT TCA GGT GAA 5 06 8 

Gly Lys Thr Ala Phe Gin Val Leu Glu Glu Tyr Pro Asp Ser Gly Glu 
105 110 115 

AAT ATT GTT GAT GCG CTG GCA GTG TTC CTG CGC CGG TTG CAT TCG ATT 5116 
Asn lie Val Asp Ala Leu Ala Val Phe Leu Arg Arg Leu His Ser lie 
35 120 125 130 

CCT GTT TGT AAT TGT CCT TTT AAC AGC GAT CGC GTA TTT CGT CTC GCT 5164 
Pro Val Cys Asn Cys Pro Phe Asn Ser Asp Arg Val Phe Arg Leu Ala 
135 140 145 150 

CAG GCG CAA TCA CGA ATG AAT AAC GGT TTG GTT GAT GCG AGT GAT TTT 5212 

4 0 Gin Ala Gin Ser Arg Met Asn Asn Gly Leu Val Asp Ala Ser Asp Phe 

155 160 165 

GAT GAC GAG CGT AAT GGC TGG CCT GTT GAA CAA GTC TGG AAA GAA ATG 52 6 0 

Asp Asp Glu Arg Asn Gly Trp Pro Val Glu Gin Val Trp Lys Glu Met 
170 175 180 

4 5 CAT AAG CTT TTG CCA TTC TCA CCG GAT TCA GTC GTC ACT CAT GGT GAT 5 3 0 8 

His Lys Leu Leu Pro Phe Ser Pro Asp Ser Val Val Thr His Gly Asp 
185 190 195 

TTC TCA CTT GAT AAC CTT ATT TTT GAC GAG GGG AAA TTA ATA GGT TGT 5 3 56 

Phe Ser Leu Asp Asn Leu lie Phe Asp Glu Gly Lys Leu lie Gly Cys 
50 200 205 ~ 210 

ATT GAT GTT GGA CGA GTC GGA ATC GCA GAC CGA TAC CAG GAT CTT GCC 54 04 

lie Asp Val Gly Arg Val Gly lie Ala Asp Arg Tyr Gin Asp Leu Ala 
215 220 225 230 
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ATC CTA TGG AAC TGC CTC GGT GAG TTT TCT CCT TCA TTA CAG AAA CGG 54 52 

lie Leu Trp Asn Cys Leu Gly Glu Phe Ser Pro Ser Leu Gin Lys Arg 
235 240 245 

CTT TTT CAA AAA TAT GGT ATT GAT AAT CCT GAT ATG AAT AAA TTG CAG 5 500 

Leu Phe Gin Lys Tyr Gly lie Asp Asn Pro Asp Met Asn Lys Leu Gin 
250 * 255 260 

TTT CAT TTG ATG CTC GAT GAG TTT TTC TAA T CAG AATTGG TTAATTGGTT 5 55 0 

Phe His Leu Met Leu Asp Glu Phe Phe * 
265 270 

GTAACACTGG CAGAGCATTA CGCTGACTTG ACGGGACGGC GGCTTTGTTG AATAAATCGA 5610 

ACTTTTGCTG AGTTGAAGGA TCAGATCACG CATCTTCCCG ACAACGCAGA CCGTTCCGTG 56 7 0 

GCAAAGCAAA AGTTCAAAAT CACCAACTGG TCCACCTACA ACAAAG CTCT CATCAACCGT 57 3 0 

GGCTCCCTCA CTTTCTGGCT GGATGATGGG GCGATTCAGG CCTGGTATGA GTCAGCAACA 57 9 0 

CCTTCTTCAC GAGGCAGACC TCAGCGCCCC CCCCCCCCTG CAGGTCGA 5 83 8 

(2) INFORMATION FOR SEQ ID NO : 4 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 3 97 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 4 : 

Met Lys Ser Asn Asn Ala Leu lie Val lie Leu Gly Thr Val Thr Leu 
15 10 15 

Asp Ala Val Gly lie Gly Leu Val Met Pro Val Leu Pro Gly Leu Leu 
20 ' 25 30 

Arg Asp lie Val His Ser Asp Ser lie Ala Ser His Tyr Gly Val Leu 
35 40 45 

Leu Ala Leu Tyr Ala Leu Met Gin Phe Leu Cys Ala Pro Val Leu Gly 
50 ' 55 60 

Ala Leu Ser Asp Arg Phe Gly Arg Arg Pro Val Leu Leu Ala Ser Leu 
65 " 70 75 80 

Leu Gly Ala Thr lie Asp Tyr Ala lie Met Ala Thr Thr Pro Val Leu 
85 90 95 

Trp lie Leu Tyr Ala Gly Arg lie Val Ala Gly lie Thr Gly Ala Thr 
100 ' 105 110 

Gly Ala Val Ala Gly Ala Tyr lie Ala Asp lie Thr Asp Gly Glu Asp 
115 120 125 

Arg Ala Arg His Phe Gly Leu Met Ser Ala Cys Phe Gly Val Gly Met 
130 135 140 

Val Ala Gly Pro Val Ala Gly Gly Leu Leu Gly Ala lie Ser Leu His 
145 150 155 160 

Ala Pro Phe Leu Ala Ala Ala Val Leu Asn Gly Leu Asn Leu Leu Leu 
165 170 175 
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5 Gly Cys Phe Leu Met Gin Glu Ser His Lys Gly Glu Arg Arg Pro Met 

180 185 190 

Pro Leu Arg Ala Phe Asn Pro Val Ser Ser Phe Arg Trp Ala Arg Gly 
195 200 205 

Met Thr lie Val Ala Ala Leu Met Thr Val Phe Phe lie Met Gin Leu 
10 210 215 220 

Val Gly Gin Val Pro Ala Ala Leu Trp Val lie Phe Gly Glu Asp Arg 
225 230 235 240 

Phe Arg Trp Ser Ala Thr Met lie Gly Leu Ser Leu Ala Val Phe Gly 
245 250 255 

15 lie Leu His Ala Leu Ala Gin Ala Phe Val Thr Gly Pro Ala Thr Lys 

260 265 270 

Arg Phe Gly Glu Lys Gin Ala He He Ala Gly Met Ala Ala Asp Ala 
275 280 285 

Leu Gly Tyr Val Leu Leu Ala Phe Ala Thr Arg Gly Trp Met Ala Phe 
20 290 295 300 

Pro He Met He Leu Leu Ala Ser Gly Gly He Gly Met Pro Ala Leu 
305 310 315 320 

Gin Ala Met Leu Ser Arg Gin Val Asp Asp Asp His Gin Gly Gin Leu 
325 330 335 

25 Gin Gly Ser Leu Ala Ala Leu Thr Ser Leu Thr Ser He Thr Gly Pro 

340 345 350 

Leu He Val Thr Ala He Tyr Ala Ala Ser Ala Ser Thr Trp Asn Gly 
355 360 365 

Leu Ala Trp He Val Gly Ala Ala Leu Tyr Leu Val Cys Leu Pro Ala 
30 370 375 380 

Leu Arg Arg Gly Ala Trp Ser Arg Ala Thr Ser Thr * 
385 390 395 

(2) INFORMATION FOR SEQ ID NO : 5 : 

(i) SEQUENCE CHARACTERISTICS: 

3 5 (A) LENGTH: 220 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 5: 

4 0 Met Glu Lys Lys He Thr Gly Tyr Thr Thr Val Asp He Ser Gin Trp 

15 10 15 

His Arg Lys Glu His Phe Glu Ala Phe Gin Ser Val Ala Gin Cys Thr 
20 25 30 

Tyr Asn Gin Thr Val Gin Leu Asp He Thr Ala Phe Leu Lys Thr Val 
45 35 40 45 

Lys Lys Asn Lys His Lys Phe Tyr Pro Ala Phe He His He Leu Ala 
50 ' 55 ~ 60 

Arg Leu Met Asn Ala His Pro Glu Phe Arg Met Ala Met Lys Asp Gly 
65 70 75 80 
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Glu Leu Val lie Trp Asp Ser Val His Pro Cys Tyr Thr Val Phe His 
85 90 95 

Glu Gin Thr Glu Thr Phe Ser Ser Leu Trp Ser Glu Tyr His Asp Asp 
100 105 110 

Phe Arg Gin Phe Leu His lie Tyr Ser Gin Asp Val Ala Cys Tyr Gly 
115 120 125 

Glu Asn Leu Ala Tyr Phe Pro Lys Gly Phe lie Glu Asn Met Phe Phe 
130 * 135 140 

Val Ser Ala Asn Pro Trp Val Ser Phe Thr Ser Phe Asp Leu Asn Val 
145 150 155 160 

Ala Asn Met Asp Asn Phe Phe Ala Pro Val Phe Thr Met Gly Lys Tyr 
165 170 175 

Tyr Thr Gin Gly Asp Lys Val Leu Met Pro Leu Ala lie Gin Val His 
180 185 190 

His Ala Val Cys Asp Gly Phe His Val Gly Arg Met Leu Asn Glu Leu 
195 200 205 

Gin Gin Tyr Cys Asp Glu Trp Gin Gly Gly Ala * 
210 " " 215 220 

(2) INFORMATION FOR SEQ ID NO : 6 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2 72 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 6 : 

Met Ser His lie Gin Arg Glu Thr Ser Cys Ser Arg Pro Arg Leu Asn 
15 10 15 

Ser Asn Met Asp Ala Asp Leu Tyr Gly Tyr Lys Trp Ala Arg Asp Asn 
20 25 30 

Val Gly Gin Ser Gly Ala Thr lie Tyr Arg Leu Tyr Gly Lys Pro Asp 
35 40 45 

Ala Pro Glu Leu Phe Leu Lys His Gly Lys Gly Ser Val Ala Asn Asp 
50 55 60 

Val Thr Asp Glu Met Val Arg Leu Asn Trp Leu Thr Glu Phe Met Pro 
65 70 - 75 80 

Leu Pro Thr lie Lys His Phe lie Arg Thr Pro Asp Asp Ala Trp Leu 
85 90 95 

Leu Thr Thr Ala lie Pro Gly Lys Thr Ala Phe Gin Val Leu Glu Glu 
100 105 HO 

Tyr Pro Asp Ser Gly Glu Asn lie Val Asp Ala Leu Ala Val Phe Leu 
115 * 120 125 

Arg Arg Leu His Ser lie Pro Val Cys Asn Cys Pro Phe Asn Ser Asp 
130 135 140 

Arg Val Phe Arg Leu Ala Gin Ala Gin Ser Arg Met Asn Asn Gly Leu 
145 150 155 160 
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5 Val Asp Ala Ser Asp Phe Asp Asp Glu Arg Asn Gly Trp Pro Val Glu 

165 170 175 

Gin Val Trp Lys Glu Met His Lys Leu Leu Pro Phe Ser Pro Asp Ser 
180 185 190 

Val Val Thr His Gly Asp Phe Ser Leu Asp Asn Leu He Phe Asp Glu 
10 195 200 205 

Gly Lys Leu He Gly Cys He Asp Val Gly Arg Val Gly He Ala Asp 
210 215 220 

Arg Tyr Gin Asp Leu Ala He Leu Trp Asn Cys Leu Gly Glu Phe Ser 
225 230 235 240 

15 Pro Ser Leu Gin Lys Arg Leu Phe Gin Lys Tyr Gly He Asp Asn Pro 

245 250 255 

Asp Met Asn Lys Leu Gin Phe His Leu Met Leu Asp Glu Phe Phe * 
260 265 270 

(2) INFORMATION FOR SEQ ID NO : 7 : 

2 0 (i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 19 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

25 (ii) MOLECULE TYPE: other nucleic acid 

(A) DESCRIPTION: /desc = "Tn5 wild type outside end" 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 7 : 

CTGACTCTTA TACACAAGT 19 

(2) INFORMATION FOR SEQ ID NO : 8 : 

30 (i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 19 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

3 5 (ii) MOLECULE TYPE: other nucleic acid 

(A) DESCRIPTION: /desc = "Tn5 mutant outside end" 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 8 : 
CTGTCTCTTA TACACATCT 19 
(2) INFORMATION FOR SEQ ID NO : 9 : 

4 0 (i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 19 base pairs 

(B) TYPE: nucleic acid 
<C) STRANDEDNESS: double 
(D) TOPOLOGY: linear 

4 5 (ii) MOLECULE TYPE: other nucleic acid 

(A) DESCRIPTION: /desc = "Tn5 mutant outside end" 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 9: 

CTGTCTCTTA TACAGATCT 19 
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(2) INFORMATION FOR SEQ ID NO: 10: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 19 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: other nucleic acid 

(A) DESCRIPTION: /desc = ,r Tn5 wild type inside end" 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 10: 

CTGTCTCTTG ATCAGATCT 19 

(2) INFORMATION FOR SEQ ID NO: 11: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 19182 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: circular 

(ii) MOLECULE TYPE: other nucleic acid 

(A) DESCRIPTION: /desc = M Plasmid pRZ4196" 

(ix) FEATURE: 

(A) NAME /KEY : repeat_unit 

(B) LOCATION: 94 . . 112 

(D) OTHER INFORMATION: /note= "Wild type OE sequence" 

(ix) FEATURE: 

(A) NAME /KEY : repeat_unit 

(B) LOCATION: 12184.. 12225 

(D) OTHER INFORMATION: /note= "Cassette IE" 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 11: 

TTCCTGTAAC AATAGCAATA CCCCAAATAC CTAATGTAGT TCCAGCAAGC AAGCTAAAAA 6 0 

GTAAAGCAAC AACATAACTC ACCCCTGCAT CTGCTGACTC TTATACACAA GTAGCGTCCC 12 0 

GGGATCGGGA TCCCGTCGTT TTACAACGTC GTGACTGGGA AAACCCTGGC GTTACCCAAC 18 0 

TTAATCGCCT TGCAGCACAT CCCCCTTTCG CCAGCTGGCG TAATAGCGAA GAGGCCCGCA 24 0 

CCGATCGCCC TTCCCAACAG TTGCGCAGCC TGAATGGCGA ATGGCGCTTT GCCTGGTTTC 3 00 

CGGCACCAGA AGCGGTGCCG GAAAGCTGGC TGGAGTGCGA TCTTC CTG AG GCCGATACTG 36 0 

TCGTCGTCCC CTCAAACTGG CAGATGCACG GTTACGATGC GCCCATCTAC ACCAACGTAA 42 0 

CCTATCCCAT TACGGTCAAT CCGCCGTTTG TTCCCACGGA GAATCCGACG GGTTGTTACT 480 

CGCTCACATT TAATGTTGAT GAAAGCTGGC TACAGGAAGG C C AG AC G C G A ATTATTTTTG 54 0 

ATGGCGTTAA CTCGGCGTTT CATCTGTGGT GCAACGGGCG CTGGGTCGGT TACGGCCAGG 60 0 

ACAGTCGTTT GCCGTCTGAA TTTGACCTGA GCGCATTTTT ACGCGCCGGA GAAAACCGCC 66 0 

TCGCGGTGAT GGTGCTGCGT TGGAGTGACG GCAGTTATCT GGAAGATCAG GATATGTGGC 72 0 

GGATGAGCGG CATTTTCCGT GACGTCTCGT TGCTGCATAA ACCGACTACA CAAATCAGCG 78 0 

ATTTCCATGT TGCCACTCGC TTTAATGATG ATTTCAGCCG CGCTGTACTG GAGGCTGAAG 84 0 

TTCAGATGTG CGGCGAGTTG CGTGACTACC TACGGGTAAC AGTTTCTTTA TGGCAGGGTG 90 0 
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10 



15 



20 



25 



AAACGCAGGT 


CGCCAGCGGC 


ACCGCGCCTT 


GTTATGCCGA 


TCGCGT CACA 


CTACGTCTGA 


AAATCCCGAA 


TCTCTATCGT 


GCGGTGGTTG 


AAGCAGAAGC 


CTGCGATGTC 


GGTTTCCGCG 


TGAACGGCAA 


GCCGTTGCTG 


ATTCGAGGCG 


GTCAGGTCAT 


GGATG AG CAG 


ACGATGGTGC 


TTAACGCCGT 


GCGCTGTTCG 


CATTATCCGA 


GCTACGGCCT 


GTATGTGGTG 


GATGAAGCCA 


ATCGTCTGAC 


CGATGATCCG 


CGCTGGCTAC 


TGCAGCGCGA 


TCGTAATCAC 


CCGAGTGTGA 


ACGGCGCTAA 


TCACGACGCG 


CTGTATCGCT 


TGCAGTATGA 


AGGCGGCGGA 


GCCGACACCA 


CGCGCGTGGA 


TGAAGACCAG 


CCCTTCCCGG 


TTTCGCTACC 


TGGAGAGACG 


CGCCCGCTGA 


ACAGTCTTGG 


CGGTTTCGCT 


AAATACTGGC 


GCGGCTTCGT 


CTGGGACTGG 


GTGGATCAGT 


CGTGGTCGGC 


TTACGGCGGT 


GATTTTGGCG 


ACGGTCTGGT 


CTTTGCCGAC 


CGCACGCCGC 


AGCAGTTTTT 


CCAGTTCCGT 


TTATCCGGGC 


TCCGTCATAG 


CGATAACGAG 


CTCCTGCACT 


CAAGCGGTGA 


AGTGCCTCTG 


GATGTCGCTC 


AACTACCGCA 


GCCGGAGAGC 


GCCGGGCAAC 


ACGCGACCGC 


ATGGT CAGAA 


GCCGGGCACA 


AAAACCTCAG 


TGTGACGCTC 


CCCGCCGCGT 



AAATGGATTT TTGCATCGAG CTGGGTAATA 
3 0 TTCTTTCACA GATGTGGATT GGCGATAAAA 

TCACCCGTGC ACCGCTGGAT AACGACATTG 
ACGCCTGGGT CGAACGCTGG AAGGCGGCGG 
AGTGCACGGC AGATACACTT GCTGATGCGG 
ATCAGGGGAA AACCTTATTT ATCAGCCGGA 
3 5 TGGCGATTAC CGTTGATGTT GAAGTGGCGA 

TGAACTGCCA GCTGGCGCAG GTAGCAGAGC 
AAAACTATCC CGACCGCCTT ACTGCCGCCT 
ACATGTATAC CCCGTACGTC TTCCCGAGCG 
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TCGGCGGTGA 


AATTATCGAT 


GAGCGTGGTG 


960 


ACGTCGAAAA 


CCCGAAACTG 


TGGAGCGCCG 


1020 


AACTG C AC AC 


CGCCGACGGC 


ACGCTGATTG 


1080 


AGGTGCGGAT 


TGAAAATGGT 


CTGCTGCTGC 


1140 


TTAAC CGTC A 


CGAGCATCAT 


CCTCTGCATG 


1200 


AGGATATCCT 


GCTGATGAAG 


CAGAACAACT 


1260 


ACCATCCGCT 


GTGGTACACG 


CTGTGCGACC 


1320 


ATATTGAAAC 


CCACGGCATG 


GTGCCAATGA 


1380 


CGGCGATGAG 


CGAACGCGTA 


ACGCGAATGG 


1440 


TCATCTGGTC 


GCTGGGGAAT 


GAATCAGGCC 


1500 


GGATCAAATC 


TGTCGATCCT 


TCCCGCCCGG 


1560 


CGGCCACCGA 


TATTATTTGC 


CCGATGTACG 


1620 


CTGTGCCGAA 


ATGGTCCATC 


AAAAAATGGC 


1680 


TCCTTTGCGA 


ATACGCCCAC 


G CG ATGGGTA 


1740 


AGGCGTTTCG 


TCAGTATCCC 


CGTTTACAGG 


1800 


CGCTGATTAA 


ATATGATGAA 


AACGGCAACC 


1860 


ATACGCCGAA 


CGATCG CC AG 


TTCTGTATGA 


1920 


ATCCAGCGCT 


GACGGAAGCA 


AAAC AC CAG C 


1980 


AAACCATCGA 


AGTGAC CAG C 


GAATACCTGT 


2040 


GGATGGTGGC 


GCTGGATGGT 


AAGCCGCTGG 


2100 


CACAAGGTAA 


ACAGTTGATT 


GAACTGCCTG 


2160 


TCTGGCTCAC 


AGTACGCGTA 


GTGCAACCGA 


2220 


TCAGCGCCTG 


GCAGCAGTGG 


CGTCTGGCGG 


2280 


CCCACGCCAT 


CCCGCATCTG 


ACCACCAGCG 


2340 


AGCGTTGGCA 


ATTTAACCGC 


CAGTCAGGCT 


2400 


AACAACTGCT 


GACGCCGCTG 


CGCGATCAGT 


2460 


GCGTAAGTGA 


AGCGACCCGC 


ATTGACCCTA 


2520 


GCCATTACCA 


GGCCGAAGCA 


GCGTTGTTGC 


2580 


TGCTGATTAC 


GACCGCTCAC 


GCGTGGCAGC 


2640 


AAACCTACCG 


GATTGATGGT 


AGTGGTCAAA 


2700 


GCGATACACC 


GCATCCGGCG 


CGGATTGGCC 


2760 


GGGTAAACTG 


GCTCGGATTA 


GGGCCGCAAG 


2820 


GTTTTG AC CG 


CTGGGATCTG 


C CATTGTCAG 


2880 


AAAACGGTCT 


GCGCTGCGGG 


ACGCGCGAAT 


2940 
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TGAATTATGG CCCACACCAG TGGCGCGGCG ACTTCCAGTT CAACATCAGC CGCTACAGTC 3 00 0 

AACAGCAACT GATGGAAACC AGCCATCGCC ATCTGCTGCA CGCGGAAGAA GGCACATGGC 306 0 

TGAATATCGA CGGTTTCCAT ATGGGGATTG GTGGCGACGA CTCCTGGAGC CCGTCAGTAT 3120 

CGGCGGATTC C AG CT GAG CG CCGGTCGCTA CCATTACCAG TTGGTCTGGT GTCAAAAATA 318 0 

ATAATAACCG GGCAGGCCAT GTCTGCCCGT ATTTCGCGTA AGGAAATCCA TTATGTACTA 324 0 

TTTAAAAAAC ACAAACTTTT GGATGTTCGG TTTATTCTTT TTCTTTTACT TTTTTATCAT 3 3 00 

GGGAGCCTAC TTCCCGTTTT TCCCGATTTG GCTACATGAC ATCAAC CATA TCAGCAAAAG 3 36 0 

TGATACGGGT ATTATTTTTG CCGCTATTTC TCTGTTCTCG CTATTATTCC AACCGCTGTT 3 4 20 

TGGTCTGCTT TCTGACAAAC TCGGGCTGCG CAAATACCTG CTGTGGATTA TTACCGGCAT 34 8 0 

GTTAGTGATG TTTGCGCCGT TCTTTATTTT TATCTTCGGG CCACTGTTAC AATACAACAT 3 54 0 

TTTAGTAGGA TCGATTGTTG GTGGTATTTA TCTAGGCTTT TGTTTTAACG CCGGTGCGCC 3 6 00 

AGCAGTAGAG GCATTTATTG AGAAAGTCAG CCGTCGCAGT AATTTCGAAT TTGGTCGCGC 3 660 

GCGGATGTTT GGCTGTGTTG GCTGGGCGCT GTGTGCCTCG ATTGTCGGCA TCATGTTCAC 3 720 

CATCAATAAT CAGTTTGTTT TCTGGCTGGG CTCTGGCTGT GCACTCATCC TCGCCGTTTT 3 78 0 

ACTCTTTTTC GCCAAAACGG ATGCGCCCTC TTCTGCCACG GTTGCCAATG CGGTAGGTGC 3 84 0 

CAACC ATT CG GCATTTAGCC TTAAGCTGGC ACTGGAACTG TTCAGACAGC CAAAACTGTG 3 900 

GTTTTTGTCA CTGTATGTTA TTGGCGTTTC CTGCACCTAC GATGTTTTTG ACCAACAGTT 3 960 

TGCTAATTTC TTTACTTCGT TCTTTGCTAC CGGTGAACAG GGTACGCGGG TATTTGGCTA 4 020 

CGTAACGACA ATGGGCGAAT TACTTAACGC CT CGATTATG TTCTTTGCGC CACTGATCAT 4 08 0 

TAATCGCATC GGTGGGAAAA ACGCCCTGCT GCTGGCTGGC ACTATTATGT CTGTACGTAT 4140 

TATTGGCTCA TCGTTCGCCA CCTCAGCGCT GGAAGTGGTT ATTCTGAAAA CGCTGCATAT 42 0 0 

GTTTGAAGTA CCGTTCCTGC TGGTGGGCTG CTTTAAATAT ATTACCAGCC AGTTTGAAGT 42 6 0 

GCGTTTTTCA GCGACGATTT ATCTGGTCTG TTTCTGCTTC TTTAAGCAAC TGGCGATGAT 43 2 0 

TTTTATGTCT GTACTGGCGG GCAATATGTA TGAAAGCATC GGTTTCCAGG GCGCTTATCT 43 8 0 

GGTGCTGGGT CTGGTGGCGC TGGGCTTCAC CTTAATTTCC GTGTTCACGC TTAGCGGCCC 444 0 

CGGCCCGCTT TCCCTGCTGC GTCGTCAGGT GAATGAAGTC GCTTAAGCAA TCAATGTCGG 4 5 00 

ATGCGGCGCG ACGCTTATCC GACCAACATA TCATAACGGA GTGATCGCAT TGAACATGCC 4 56 0 

AATGACCGAA AGAATAAGAG CAGGCAAGCT ATTTACCGAT ATGTGCGAAG GCTTACCGGA 46 2 0 

AAAAAGACTT CGTGGGAAAA CGTTAATGTA TGAGTTTAAT CACTCGCATC CATCAGAAGT 468 0 

TGAAAAAAGA GAAAGCCTGA TTAAAGAAAT GTTTGCCACG GTAGGGGAAA ACGCCTGGGT 474 0 

AGAACCG CCT GTCTATTTCT CTTACGGTTC CAACATCCAT ATAGGCCGCA ATTTTTATGC 4 80 0 

AAATTTCAAT TTAACCATTG TCGATGACTA CACGGTAACA ATCGGTGATA ACGTACTGAT 4 860 

TGCACCCAAC GTTACTCTTT CCGTTACGGG ACACCCTGTA CACCATGAAT TGAGAAAAAA 4 92 0 

CGGCGAGATG TACTCTTTTC CGATAACGAT TGGCAATAAC GTCTGGATCG GAAGTCATGT 4 98 0 
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5 GGTTATTAAT CCAGGCGTCA CCATCGGGGA 

CACAAAAGAC ATTCCACCAA ACGTCGTGGC 
AATAAACGAC CGGGATAAGC ACTATTATTT 
AATTATAAAA ATTGCCTGAT ACGCTGCGCT 
TAGCCGCATC CGG CATGAAC AAAGCGCAGG 

10 AC AGCTG CGG AAAACGTACT GGTGCAAAAC 

ACAGCGCATG AAATGCCCAG TCCATCAGGT 
AAAACCACGG GGCAAGCCCG GCGATGATAA 
TGCCAGCAAT AGCCGGTTGC ACAGAGTGAT 
CGCCGCCCAG ACCTAACCCA CACACCATCG 

15 AG AT AAAG C C GCAGAACCCC ACCAGTTGTA 
GATCCTGATG GCGAGCCATA G C AG G CAT C A 
TCAATGCCAG TAAGGAACCG CTGTACTGCG 
GTAACCAGGC AATCAGGCTG GCGTAACCGC 
TCCACGCGCG GGGAGTGAAT ACCACGCGAA 

2 0 CGACCTCGCG GGCGCTTTGC CACCACCAGG 

CCAGGCGAGT GTTTGATACC AGGTTTCGCT 
ACCAAGCCCA CCGCCGCCCA TCAGAGCCGC 
CTGCTGAAAC CGCCGTTTAA TCACCGAAGC 
CCACCAAGCA GTGCGCTGCT AAGCAGCAGC 
25 GCACCGACGG CAATCAGCAA CAGACTGATG 

TGAAGCCAGC TTCCGGCCAG CGCCAGCCCG 
CCGGACGGGA CGCTCCTGCG CCTGATACAG 
TGTCTTCCCG TTTTCCGCCT GAGGTCACTG 
ACGGCGAGCT GCTCACCACC CACTCGAGCT 

3 0 CGGCGATGCT GAAGGTCGCG CGCATTCCCG 

GGTGGGACGG GCAGGGCGCC GCCCGAGTCT 
AGCGCGCGTC CGGGGCCGGG GACCTTGCAC 
CTTGCAGGAT CTATGATTCC CTTTGTCAAC 
TCACATTAAG TGGTATTCAA TATTTTCATG 
3 5 CACGTAAAAT CTGTTGTGCG TGTTTAGATT 

ACGTTGGAGC CGCATTATTT TCGCTTTATG 
GTTACCGTGA AGTTACCATC ACGGAAAAAG 
TTTAAGTTGT TTTTCTAATC CGCATATGAT 
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TAATTCTGTT 


ATTGGCGCGG 


GTAGTATCGT 


5040 


GGCTGGCGTT 


CCTTGTCGGG 


TTATTCGCGA 


5100 


CAAAGATTAT 


AAAGTTGAAT 


CGTCAGTTTA 


5160 


TATCAGGCCT 


ACAAGTTCAG 


CGATCTACAT 


5220 


AACAAGCGTC 


GCATCATGCC 


TCTTTGACCC 


5280 


GCAGGGTTAT 


GATCATCAGC 


CCAACGACGC 


5340 


AATTGCCGCT 


GATACTACGC 


AGCACGCCAG 


5400 


AACCGATTCC 


CTGCATAAAC 


GCCACCAGCT 


5460 


CGAGCGCCAG 


CAGCAAACAG 


AGCGGAAACG 


5520 


CCCACAATAC 


CGGCAATTGC 


ATCGGCAGCC 


5580 


ACACCAGCGC 


CAGCATTAAC 


AGTTTGCGCC 


5640 


GCAAAGCTCC 


TGCGGCTTGC 


CCAAGCGTCA 


5700 


CGCTGGCACC 


AATCTCAATA 


TAGAAAGCGG 


5760 


CGTTAATCAG 


ACCGAAGTAA 


ACACCCAGCG 


5820 


C CGG AGTGGT 


TGTTGTCTTG 


TGGGAAGAGG 


5880 


CAAAGAGCGC 


AACAACGGCA 


GGCAGCGCCA 


5940 


ATGTTGAACT 


AAC CAGGGCG 


TTATGGCGGC 


6000 


GGAC CACAGC 


CCCATCACCA 


GTGGCGTGCG 


6060 


ATCACCGCCT 


GAATGATGCC 


GATCCCCACC 


6120 


GCACTTTGCG 


GGTAAAGCTC 


ACGCATCAAT 


6180 


GCGACACTGC 


GACGTTCGCT 


GACATGCTGA 


6240 


CCCATGGTAA 


CCACCGGCAG 


AGCGGTCGAC 


6300 


AACGAATTGC 


TTGCAGGCAT 


CTCATGAGTG 


6360 


CGTGGATGGA 


GCGCTGGCGC 


CTGCTGCGCG 


6420 


GGATACTTCC 


CGTCCGCCAG 


GGGGACATGC 


6480 


ATGAAGAGGC 


CGGTTACCGC 


CTGTTGACCT 


6540 


TCGCCTCGGC 


GGCGGGCGCT 


CTGCTCATGG 


6600 


AGATAG CGTG 


GTCCGGCCAG 


GACGACGAGG 


6660 


AGCAATGGAT 


C AC TG AAAAT 


GGTTCAATGA 


6720 


AAATGGGAAT 


TGACGTTCCT 


TCCAAACATT 


6780 


GGAGTGAACG 


CCGTTTCCAT 


TTAGGTGGGT 


6840 


AATCTAAAGG 


GTGGTTAACT 


CGAC AT CTTG 


6900 


GTTATG CTGC 


TTTTAAGACC 


CACTTTCACA 


6960 


CAATTCAAGG 


CCGAATAAGA 


AGGCTGGCTC 


7020 
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TGCACCTTGG TGATCAAATA ATTCGATAGC TTGTCGTAAT AATGGCGGCA TACTATCAGT 7080 

AGTAGGTGTT TCCCTTTCTT CTTTAGCGAC TTGATGCTCT TGATCTTCCA ATACGCAACC 7140 

TAAAGTAAAA TGCCCCACAG CGCTGAGTGC ATATAATGCA TTCTCTAGTG AAAAACCTTG 72 00 

TTGGCATAAA AAGGCTAATT GATTTTCGAG AGTTTCATAC TGTTTTTCTG TAGGCCGTGT 7260 

ACCTAAATGT ACTTTTGCTC CATCGCGATG ACTTAGTAAA GCACATCTAA AACTTTTAGC 732 0 

GTTATTACGT AAAAAATCTT GCCAGCTTTC CCCTTCTAAA GGGCAAAAGT GAGTATGGTG 7380 

CCTATCTAAC ATCTCAATGG CTAAGGCGTC GAGCAAAGCC CGCTTATTTT TTACATGCCA 744 0 

ATACAATGTA GGCTG CTCTA CACCTAGCTT CTGGGCGAGT TTACGGGTTG TTAAACCTTC 7 500 

GATTCCGACC TCATTAAGCA GCTCTAATGC GCTGTTAATC ACTTTACTTT TATCTAATCT 7 56 0 

AG AC AT C ATT AATTCCTAAT TTTTGTTGAC ACTCTATCAT TGATAGAGTT ATTTTACCAC 7620 

TCCCTATCAG TGATAGAGAA AAGTGAAATG AATAGTTCGA CAAAGATCGC ATTGGTAATT 768 0 

ACGTTACTCG ATGCCATGGG GATTGGCCTT ATCATGCCAG TCTTGCCAAC GTTATTACGT 774 0 

GAATTTATTG CTTCGGAAGA TATCGCTAAC CACTTTGGCG TATTGCTTGC ACTTTATGCG 78 0 0 

TTAATGCAGG TTATCTTTGC TCCTTGGCTT GGAAAAATGT CTGACCGATT TGGTCGGCGC 7 86 0 

CCAGTGCTGT TGTTGTCATT AATAGGCGCA TCGCTGGATT ACTTATTGCT GGCTTTTTCA 7 92 0 

AGTGCGCTTT GGATGCTGTA TTTAGGCCGT TTGCTTTCAG GGATCACAGG AGCTACTGGG 7 98 0 

GCTGTCGCGG CATCGGTCAT TGCCGATACC ACCTCAGCTT CTCAACGCGT GAAGTGGTTC 804 0 

GGTTGGTTAG GGGCAAGTTT TGGGCTTGGT TTAATAGCGG GGCCTATTAT TGGTGGTTTT 810 0 

GCAGGAGAGA TTTCACCGCA TAGTCCCTTT TTTATCGCTG CGTTG CT AAA TATTGTCACT 816 0 

TTCCTTGTGG TTATGTTTTG GTTCCGTGAA ACCAAAAATA CACGTGATAA TACAGATACC 822 0 

GAAGTAGGGG TTGAGACGCA ATCGAATTCG GTATACATCA CTTTATTTAA AACGATGCCC 8280 

ATTTTGTTGA TTATTTATTT TTCAGCGCAA TTGATAGGCC AAATTCCCGC AACGGTGTGG 834 0 

GTGCTATTTA CCGAAAATCG TTTTGGATGG AATAGCATGA TGGTTGGCTT TTCATTAGCG 84 0 0 

GGTCTTGGTC TTTTACACTC AGTATTCCAA GCCTTTGTGG CAGGAAGAAT AG CC ACTAAA 846 0 

TGGGGCGAAA AAACGGCAGT ACTGCTCGAA TTTATTGCAG ATAGTAGTGC ATTTGCCTTT 8 52 0 

TTAGCGTTTA TATCTGAAGG TTGGTTAGAT TTCCCTGTTT TAATTTTATT GGCTGGTGGT 85 8 0 

GGGATCGCTT TACCTGCATT ACAGGGAGTG ATGTCTATCC AAACAAAGAG TCATGAGCAA 864 0 

GGTGCTTTAC AGGGATTATT GGTGAGCCTT ACCAATGCAA CCGGTGTTAT TGGCCCATTA 8 700 

CTGTTTACTG TTATTTATAA TCATTCACTA CCAATTTGGG ATGGCTGGAT TTGGATTATT 8 760 

GGTTTAGCGT TTTACTGTAT TATTATCCTG CTATCGATGA CCTTCATGTT AACCCCTCAA 8 82 0 

GCTCAGGGGA GTAAACAGGA GACAAGTGCT TAGTTATTTC GTCACCAAAT GATGTTATTC 88 80 

CGCGAAATAT AATGACCCTC TTGATAACCC AAGAGGGCAT TTTTTACGAT AAAGAAGATT 8 940 

TAGCTTCAAA TAAAACCTAT CTATTTTATT TATCTTTCAA GCTCAATAAA AAGC CGCGGT 9 000 

AAAT AG CAAT AAATTGGCCT TTTTTATCGG CAAGCTCTTT TAGGTTTTTC GCATGTATTG 906 0 
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CG ATATG CAT 


AAACCAGCCA 


TTGAGTAAGT 


GTTGGTTCTC 


TTGGATCAAT 


TTGCTGACAA 


GGCTAATTTT 


TTCAAGTTCA 


TTCCAAC CAA 


GGTTTTTATT 


ATTATCAATA 


ATATAATCAA 


AC CAACCATT 


TGTTAAATCA 


GTTTTTGTTG 


GCTTATAACA 


GGCACTGAGT 


AATTGTTTTT 


TGGTCACCAA 


CGCTTTTCCC 


GAGATCCTCT 


GGTC CGGACC 


GCCGCCCGAT 


CTCCATCCGC 


TGGCCGCTGA 


GCACGCGGCA 


CTTGCGCCCG 


CGCCGCGCGA 


GGTGTGCCCG 


CTCCACGGCG 


GCGACCGCGG 


CTGGCTGGCC 


ATCGACCCGC 


ATGCCAACAT 


CTTCACGAAT 


CCCGATCTCA 


CGGGCAGGCT 


GGAGGCTCGA 


CTCAGCATTG 



GGCTTCTTCG CTGGATCATT GCATGGACGG 
GCGACGGCGA GGGCGAGGGC GCTGCGATTG 

2 0 TGCTTGACTA GCGCGGTCAC CGATCTCACC 

GCGTGATCCG CTGGAAGTCG TTGCGGGCCA 
CGGCATCGTG GTGTGCGTGG CCGAGGGACT 
GGGCCGCCGC TATGACGCCC AGCGTCTTGG 
GCCCATCGAG CGGCAGGCCC GCGTGATCGG 
25 CGGTGGCTCG GGCTTGGGCG AC CTGGG CTT 

GCGCGCGGAC TTCCTGGCCG AACAGGGACT 
CACCGGAATC TGCTGGGCAG CAGCGGGCTC 
CCGCCGATAC CGGCCTGGAG CATCGCCCCG 
ACCGGCGCCC CGTCATGCTC GCCAGCGGGC 

3 0 CCAGCCTCGT GCGGTGGAAG CCCATCGAAC 

TGCGCGGTGG CGGCGTGTCT TGGGAGATTG 
GATCAGATCT TGATCCCCTG CGCCATCAGA 
CTTTGCAGGG CTTCCCAACC TTCCCAGAGG 
CTGTCCATAA AACCGCCCAG TCTAGCTATC 
3 5 TTCTCTTTGC GCTTGCGTTT TCCCTTGTCC 

GTCAGCACCG TTTCTGCGGA CTGGCTTTCT 
GCCCTGAGTG CTTGCGGCAG CGTGAAGCTT 
AACGAGAGGA TCGAGACCAT CCGCTCCAGA 
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TTTTAAGCAC 


ATCACTATCA 


TAAGCTTTAA 


9120 


TGGCGTTTAC 


CTTACCAGTA 


ATGTATTCAA 


9180 


TGATAGGCAT 


CACTTCTTGG 


ATAGGGATAA 


9240 


GATAATGTTC 


AAATATACTT 


TCTAAGGCAG 


9300 


TGATGTAGGC 


ATCAATCATA 


ATTAATTGCT 


9360 


TATTTTTAAA 


GTGATGATAA 


AAGGCACCTT 


9420 


GCGACACCGC 


CGCTCGTCTG 


CACGCGCCGC 


9480 


TACAGGAATG 


GTTCCAGCCG 


CTTTTCCGGT 


9540 


CCGCCAGCGT 


AGCGCGCCAA 


CTTCTGGCGG 


9600 


ACCTGCACCA 


CGAGAACGTG 


CTCGACTTCG 


9660 


ACGGACTGCT 


CGGCGAGCGC 


ACCTTCGACT 


9720 


GCGACCCCGG 


TCGCCCGCTT 


GCGATCCTGC 


9780 


TGGTCGCGAC 


GACCGGGTTT 


GAGCCCGAAC 


9840 


GCTTGTCGGC 


AGCCTGGTTC 


ATCGGCGACG 


9900 


ATCTGGCCGT 


AAACGC CATG 


GCACGCCGGT 


9960 


TGGTCGTCGA 


GCTAGGTCAG 


GCCGTGTCGG 


10020 


CACCCGCCGC 


CTCGAAGCCC 


TGCACCAGGC 


10080 


ATGGAAGGTG 


CCGGACGATC 


TGCCCGAGCA 


10140 


TGGCGTGACG 


GTGGAGCTGA 


AATCGCACCT 


10200 


TGCCACCTGG 


CTTGACCAGC 


AGTTGATCGA 


10260 


TAGCAGTGAG 


GCCAAGTAGG 


CGATACAGCA 


10320 


GGCCGAGCGG 


CGCGGGCAGC 


GCGTGATCCT 


10380 


GGGAACTGGC 


GCAGGCCGCG 


AAGGACATTG 


10440 


TGGCCGACGG 


CCAGCGCGTT 


GCCGGCGTCT 


10500 


GAAATGGGAT 


GCTTGATGAC 


GCCAAGGGGT 


10560 


AGCGGCTTGG 


GGAGCAGCTC 


GCCGCGACGG 


10620 


GACGACAGCG 


TGGGCCGGCC 


CCTGTCTCTT 


10680 


TCCTTGGCGG 


CAAGAAAGCC 


ATCCAGTTTA 


10740 


GCGCCCCAGC 


TGGCAATTCC 


GGTTCGCTTG 


10800 


GCCATGTAAG 


CCCACTGCAA 


GCTACCTGCT 


10860 


AGATAGCCCA 


GTAGCTGACA 


TTCATCCGGG 


10920 


ACGTGTTC CG 


CTTCCTTTAG 


CAGCCCTTGC 


10980 


TCTCTGAGCT 


GTAACAGCCT 


GACCG CAACA 


11040 


TTATCCGGCT 


CCTCCATGCG 


TTGCCTCTCG 


11100 
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GCTCCTGCTC CGGTTTTCCA TGCCTTATGG AACTCCTCGA TCCGCCAGCG ATGGGTATAA 11160 

ATGTCGATGA CGCGCAAGGC TTGGGCTAGC GACTCGACCG GTTCGCCGGT CAGCAACAAC 11220 

CATTTCAACG GGGTCTCACC CTTGGGCGGG TTAATCTCCT CGGCCAGCAC CGCGTTGAGC 112 8 0 

GTGATATTCC CCTGTTTTAG CGTGATGCGC CCACTGCGCA GGCTCAAGCT CGCCTTGCGG 1134 0 

GCTGGTCGAT TTTTACGTTT ACCGCGTTTA TCCACCACGC CCTTTTGCGG AATGCTGATC 114 00 

TGATAGCCAC CCAACTCCGG TTGGTTCTTC AGATGGTCGA T C AG AT AC AA CCCAGACTCT 1146 0 

ACGTCCTTGC GTGGGTGCTT GGAGCGCACC ACGAAGCGCT CGTTATGCGC CAGCCTGTCC 11520 

TGCAGATAAG CATGAATATC GGCTTCGCGG TCACAGACCG CAATCACGTT GCTCATCATG 1158 0 

CTGCCCATGC GTAACCGGCT AGTTGCGGCC GCTGCCAGCC ATTTGC CACT CTCCTTTTCA 1164 0 

TCCGCATCGG CAGGGTCATC CGGGCGCATC CACCACTCCT GATGCAGTAA TCCTACGGTG 117 00 

CGGAATGTGG TGGCCTCGAG CAAGAGAACG GAGTGAACCC ACCATCCGCG GGATTTATCC 1176 0 

TGAATAGAGC CCAGCTTGCC AAGCTCTTCG GCGACCTGGT GGCGATAACT CAAAGAGGTG 1182 0 

GTGTCCTCAA TGGCCAGCAG TTCGGGAAAC TCCTGAGCCA ACTTGACTGT TTGCATGGCG 1188 0 

CCAGCCTTTC TGATCGCCTC GGCAGAAACG TTGGGATTGC GGATAAATCG GTAAGCGCCT 1194 0 

TCCTGCATGG CTTCACTACC CTCTGATGAG ATGGTTATTG ATTTACCAGA ATATTTTGCC 12 000 

AATTGGGCGG CGACGTTAAC CAAGCGGGCA GTACGGCGAG GATCACCCAG CGCCGCCGAA 12 06 0 

GAGAACACAG ATTTAGCCCA GTCGGCCGCA CGATGAAGAG CAGAAGTTAT CATGAACGTT 1212 0 

ACCATGTTAG GAGGTCACAT GGAAGATCAG ATCCTGGAAA ACGGGAAAGG TTCCGTTCGA 1218 0 

ATTGCATGCG GATCCGGGAT CAAGATCTGA TCAAGAGACA GGTACCAATT GTTGAAGACG 1224 0 

AAAGGGCCTC GTGATACGCC TATTTTTATA GGTTAATGTC ATGATAATAA TGGTTTCTTA 12 3 00 

GACGTCAGGT GGCACTTTTC GGGGAAATGT GCGCGGAACC CCTATTTGTT TATTTTTCTA 1236 0 

AATACATTCA AATATGTATC CGCTCATGAG ACAATAACCC TGATAAATGC TTCAATAATA 1242 0 

TTGAAAAAGG AAGAGTATGA GTATTCAACA TTTCCGTGTC GCCCTTATTC CCTTTTTTGC 12480 

GGCATTTTGC CTTCCTGTTT TTGCTCACCC AGAAACGCTG GTGAAAGTAA AAGATGCTGA 12 54 0 

AGATCAGTTG GGTGCACGAG TGGGT T AC AT CGAACTGGAT CTCAACAGCG GTAAGATCCT 12 60 0 

TGAGAGTTTT CGCCCCGAAG AACGTTTTCC AATGATGAGC ACTTTTAAAG TTCTGCTATG 12660 

TGGCGCGGTA TTATCCCGTG TTGACGCCGG GCAAGAGCAA CTCGGTCGCC G CAT AC ACT A 12720 

TTCTCAGAAT GACTTGGTTG AGTACTTGGC AAACTGATCT AAATGTTTAG CCCAGTCATC 12 78 0 

ATACTTCACC GATGCCAACG CATTAAAAAT AGCATCACGA TCGGCTTTGC TGAATTTCTT 12840 

ATTTAAAACA TCCTTGTATT TTTCAAAAGC AGCGAGAGCT T CATTC AC AT TGCCGATTTT 12 900 

CTTACCTTTA GACTTATCAG CAAGTTCCTG TGCCATTTTC GAATATTTTT C AC C AT ATTT 12 96 0 

TTCAGTCAGC GTTTGATAAA AGCTAACTGT TG CAT C AAC A GCATCCTTAA TCTGTGAATT 13 02 0 

AAG GAG ATT A TTCTGTGCTT TTTTCAAATT TTCTTCAGCT TCATGAACAC GAGCGATACC 13 08 0 

GGCATTACGA TTATTACTGA CCTGAGAAAT AGCCTTCTGG ATCTGAGTTA TATCAGCATT 13140 
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5 TATCCGGTTA ATACGTGTTT CTGATGCTGT 

ACCGGCCCCA ACCCGTCGTC TGGTTGCTTC 
GGCTCTTGGT GATAGTTTTT TGACCAGCTC 
AGCCAGTTCA TTTCGTTTTC CAGCGAGCGT 
GGAG CTTAAA CGAGAATTGA GAGTCTTAAT 

10 TGCAGCAGAA AGTTTTTTTT GGGCGATCTC 

CAGTCGTTTC TCTTCAGCTT CAGCCAGTTT 
TTCAATCTCT TTACGTCGTT GTTCTGCTTC 
TTTACGGGCT TTTTCTTCTG CTTTCGCAAG 
TGCATTATTA GCATGAGCAA GCTCTGTTGC 

15 AGCCTCATTC ACGATATCCT TCAGGCGCTG 

CGCCTGTGCT TCCGCTGCAG CTTTTGCCCG 
TTGAGCAGTA GACCATTTAG CAGTTGCATG 
TCCTTTTCCA CCTCCGCCGC CAGAGCCACT 
AATTACCTGT CCCTTATCAT CATAAGGAAC 

2 0 CATTATAAAA TCCTCTTTGA CTTTTAAAAC 

ACCACTGGTT TTATATACAG CATAAAAGCT 
GGACTTCCAT TTTTGTGAAA ACGATCAAAA 
GCCCGATGCC ACAAAAACCA GCACAAACAT 
GAAACTATGA GATGAGACAT CTATGGGACA 

2 5 GGACGCACTA AAAATGACAG GCAGATCGCG 

CCTTGTCAGT TACCGTACCG GCAGGGACGG 
CCGGGCATAC GGCGAATTAA AGCAGAATGA 
AGAAAATCCA CATGATCAGC AGACAGAACG 
ATGCCTGACG CTGATGCTTG AGGATAAACA 

3 0 AGCAGAACGG GAACAGCTAC AAAATGAGAT 

AAAGAAACGG GGATTCTGGT CCAGGTTGTT 
TAAAATAGTC TTCGGATAAT AACTC AC CGA 
ATGAGACGTA C CGG AAACAA ACTTTGT CTT 
ACAGCCTGTA CCCCAAAGGG CAGCGTGGAA 
3 5 GATGACGGTT TTGATCCCAA CTTTTCCACC 

CCTTTTTTTC GGCAGTTCTG GGATATGGGA 
AGTGATGTGC AACAACGCAT TCAGCAGTTT 
GGCACAACTC AATTTGCGGG TACTGATTAC 



TACCTGTTTT 


TGTTTTTCTT 


CTCTAATCTT 


13200 


AAAAAAAGGA 


CGGTTCTGAA 


GCGGATCATT 


13260 


ATCCAGTTCT 


TTATATTTAG 


CGGATGCCTG 


13320 


TTTCATTTCT 


GCATCACGGG 


CATGGATACT 


13380 


CTCTCCATCC 


ATTTTCACCA 


CTTCAGATTG 


13440 


AACAGCTTTA 


GCTTCTTCAC 


TCAATGCAGC 


13500 


CAACTGGCGT 


TCTGTTTCAG 


CCTTCTCCCG 


13560 


CTGAAAAGCC 


TTTTCTGCTG 


CTTCCGCTTC 


13620 


GCGCAAACGC 


TCTGCTTCCG 


CCTGCATAGC 


13680 


TGAAGGCGTA 


CGTGAGGCAT 


TGTGACGAAG 


13740 


AGTCAGCGCA 


TCCCTGTTTG 


CCTTTGCTTT 


13800 


GGCAGCCTGC 


TCTGCCTGTG 


TTTTCTTTAA 


13860 


AATAGCTGCA 


GAACTTTCAC 


TTTTACTGCC 


13920 


CCCGTCAGGA 


GTACCATTCA 


AAAGAGTAAT 


13980 


ACCATCTTTA 


TAGTACGCTA 


CCGCGGTTTC 


14040 


AATAAGTTAA 


AAATAAATAC 


TGTACATATA 


14100 


ACGCCGCTGC 


ATTTTCCCTG 


TCAAGACTGT 


14160 


AAACAGTCTT 


TCACACCACG 


CGCTATTCTC 


14220 


TACCGTTCTC 


AGACCTCATT 


ATGTTTTACT 


14280 


CTGTCACTTT 


ATGGCATGGC 


ACACACTCCG 


14340 


TTCACAGTTT 


TACCGTGATA 


TGCGCGGAGG 


14400 


ACGACGGGAG 


TTTGAAACCA 


GTGAACTGAT 


14460 


GACACCAGAA 


AGGCACAGTG 


AGGGACATGC 


14520 


CATTCT C CGG 


GAACTGAATG 


AGCTGAAACA 


14580 


GGCACAGGAT 


ATGGATCGCA 


GACGCCAGGA 


14640 


AGCCCAGCTC 


AGGCAGGCAC 


TGGAACTGGA 


14700 


CGGTCGCTGA 


ACGCTGTCAG 


AGACT G ATG A 


14760 


GAATAAATAC 


TTTAAGGTAG 


GGAGAC ACT C 


14820 


ATCGCCATGA 


TAACAGCAAC 


AGTAG CTCTC 


14880 


CAACATACCC 


GGCATTACGT 


ATATGCTTCT 


14940 


CAAAAAGCCG 


ACACAACACG 


AATGATGGTG 


15000 


GCTAAAGACA 


AAGCGACAGG 


AAAATCACGG 


15060 


CACAGCCAAG 


AATTTTTAAA 


CTCACTCCGG 


15120 


CGCAGCAAAG 


ACCTTACCCC 


GAAAAAATCC 


15180 
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AGGCTGCTGG 


CTGACACGAT 


TTCTGCGGTT 


TATCTCGATG 


GCTACGAGGG 


CAGACAGTAA 


15240 


GTGGATTTAC 


CATAATCCCT 


TAATTGTACG 


CACCGCTAAA 


ACGCGTTCAG 


CGCGATCACG 


15300 


GCAGCAGACA 


GGTAAAAATG 


GCAACAAACC 


ACCCTAAAAA 


CTGCGCGATC 


GCGCCTGATA 


15360 


AATTTTAACC 


GTATGAATAC 


CTATGCAACC 


AGAGGGTACA 


GGCCACATTA 


CCCCCACTTA 


15420 


ATCCACTGAA 


GCTGCCATTT 


TTCATGGTTT 


CACCATCCCA 


GCGAAGGGCC 


ATCCAGCGTG 


15480 


CGTTCCTGTA 


TTTCCGGCTG 


ACGCTCCCGT 


TCTAGGGATA 


ACACATGTTC 


GCGCTCCTGT 


15540 


ATCAGCCGTT 


CCTCTCTTAT 


CTCCAGTTCT 


CGCTGTATAA 


CTGG CTCAAG 


CGTTCTGTCT 


15600 


GCTCGCTCAA 


GTGTTGCACC 


TGCTGACTCA 


ACTGCATGAC 


CCGCTCGTTC 


AGCATCGCGT 


15660 


TGTCCCGTTG 


CGTAAGCGAA 


AACATCTTCT 


GCAATTCCAC 


GAAGGCGCTC 


TCCCATTCGC 


15720 


TCAGCCGCTG 


CATATAGTCC 


TGTTGCAGCT 


GCTCTAAGGC 


GTT C AG C AAA 


TGTGTTTCCA 


15780 


GCTCTGTCAC 


TCTGTGTCAC 


TC CTTCAG AT 


GTACCCACTC 


TTTCCCCTGA 


AAGGGAATCA 


15840 


CCTCCGCTGA 


TTTCCCGTAC 


GGAAGGACAA 


GGAATTTCCT 


GTTCCCGTCC 


TGCACAAACT 


15900 


CCACGCCCCA 


TGTCTTCGCG 


TTCAGTTTCT 


GCAATGTCTC 


TTCCTGCTTC 


CTGATTTCTT 


15960 


CCAGGTTCGC 


CTGTATCCTC 


CCTCCAAGAT 


ACCAGAGCGT 


CCCGCCACTC 


GCGGTAAACA 


16020 


GGAGAAAGAC 


TATCCCCAGT 


AACATCATGC 


CCGTATTCCC 


TGCCAGCTTT 


AACACGTCCC 


16080 


TCCTGTGCTG 


CATCATCGCC 


TCTTTCACCC 


CTTCCCGGTG 


TTTTTCCAGC 


GATTCCTCTG 


16140 


TCGAGGCTGT 


GAACAGGGCT 


ATAGCGTCTC 


TGATTTTCGT 


CTCGTTTGAT 


GTCACAGCCT 


16200 


CGCTTACAGA 


TTCGCCGAGC 


CTCCTGAACT 


CGTTGTTCAG 


CATTTTCTCT 


GTAGATTCGG 


16260 


CTCTCTCTTT 


CAGCTTTTTC 


TCGAACTCCG 


CGCCCGTCTG 


CAAAAGATTG 


CTCATAAAAT 


16320 


GCTCCTTTCA 


GCCTGATATT 


CTTCCCGCCG 


TTCGGATCTG 


CAATGCTGAT 


ACTGCTTCGC 


16380 


GTCACCCTGA 


CCACTTCCAG 


CCCCGCCTCA 


GTGAGCGCCT 


GAATCACATC 


CTGACGGCCT 


16440 


TTTATCTCTC 


CGGCATGGTA 


AAGTGCATCT 


ATACCTCGCG 


TGACGCCCTC 


AGCAAGCGCC 


16500 


TGTTTCGTTT 


CAGGCAGGTT 


ATCAGGGAGT 


GTCAGCGTCC 


TGCGGTTCTC 


CGGGGCGTTC 


16560 


GGGTCATGCA 


GCCCGTAATG 


GTGATTTAAC 


AGCGTCTGCC 


AAGCATCAAT 


TCTAGGCCTG 


16620 


TCTGCGCGGT 


CGTAGTACGG 


CTGGAGGCGT 


TTTCCGGTCT 


GT AG CTC CAT 


GTTCGGAATG 


16680 


ACAAAATTCA 


GCTCAAGCCG 


TCCCTTGTCC 


TGGTGCTCCA 


CCCACAGGAT 


GCTGTACTGA 


16740 


TTTTTTTCGA 


GACCGGGCAT 


CAGTACACGC 


TCAAAGCTCG 


CCATCACTTT 


TTCACGTCCT 


16800 


CCCGGCGGCA 


GCTCCTTCTC 


CGCGAACGAC 


AGAACACCGG 


ACGTGTATTT 


CTTCGCAAAT 


16860 


GGCGTGGCAT 


CGATGAGTTC 


CCGGACTTCT 


TCCGGTATAC 


CCTGAAGCAC 


CGTTGCGCCT 


16920 


TCGCGGTTAC 


GCTCCCTCCC 


CAGCAGGTAA 


TCAACCGGAC 


CACTGCCACC 


ACCTTTTCCC 


16980 


CTGGCATGAA 


. ATTTAACTAT 


CATCCCGCGC 


CCCCTGTTCC 


CTGACAGCCA 


GACGCAGCCG 


17040 


GCGCAGCTCA 


. TCCCCGATGG 


CCATCAGTGC 


1 GGCCACCACC 


TGAACCCGGT 


CACCGGAAGA 


17100 


CCACTGCCCG 


, CTGXXCACC T 


* TACGGGCTGT 


' CTGATTCAGG 


TTATTTCCGA TGGCGGCCAG 


17160 


CTGACGCAGT 


' AACGGCGGTG CCAGTGTCGG CAGTTTTCCG 


\ GAACGGGCAA CCGGCTCCCC 


17220 
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5 


CAGGCAGACC 


CGCCGCATCC 


ATACCGCCAG 


TTGTTTACCC 


TCACAGCGTT 


CAAGTAACCG 


17280 




GGCATGTTCA 


TCATCAGTAA 


CCCGTATTGT 


GAGCATCCTC 


TCGCGTTTCA 


TCGGTATCAT 


17340 




TACCCCATGA 


ACAGAAATCC 


CCCTTACACG 


GAGGCATCAG 


TGACTAAACA 


GGAAAAAACC 


17400 




GCCCTTAACA 


TGGCCCGCTT 


TATCAGAAGC 


CAGACATTAA 


CGCTG CTGGA 


GAAGCTCAAC 


17460 




GAACTGGACG 


CAGATGAACA 


GGCCGATATT 


TGTGAATCGC 


TT CAC G AC C A 


CGCCGATGAG 


17520 


10 


CTTTACCGCA 


GCTGCCTCGC 


ACGTTTCGGG 


GATGACGGTG 


AAAACCTCTG 


AC ACATG C AG 


17580 




CTCCCGGAGA 


CGGTCACAGC 


TTGTCTGTGA 


GCGGATGCCG 


GGAGCTGACA 


AGCCCGTCAG 


17640 




GGCGCGTCAG 


CAGGTTTTAG 


CGGGTGTCGG 


GGCGCAGCCC 


TGACCCAGTC 


ACGTAGCGAT 


17700 




AGCGGAGTGT 


ATACTGGCTT 


AACCATGCGG 


CATCAGTGCG 


GATTGTATGA 


AAAGTACGCC 


17760 




ATGCCGGGTG 


TGAAATGCCG 


CACAGATGCG 


TAAGGAGAAA 


ATGCACGTCC 


AGGCGCTTTT 


17820 


15 


CCGCTTCCTC 


GCTCACTGAC 


TCGCTACGCT 


CGGTCGTTCG 


ACTGCGGCGA 


GCGGTACTGA 


17880 




CT CAC AC AAA 


AACGGTAACA 


CAGTTATCCA 


CAGAATCAGG 


GGATAAGGCC 


GGAAAGAACA 


17940 




TGTGAG C AAA 


AGACCAGGAA 


CAGGAAGAAG 


GCCACGTAGC 


AGGCGTTTTT 


CCATAGGCTC 


18000 




CGCCCCCCTG 


ACG AG CATC A 


CAAAAATAGA 


CGCTCAAGTC 


AGAGGTGGCG 


AAACCCGACA 


18060 




GG AC TAT AAA 


GCTACCAGGC 


GTTTCCCCCT 


GGAAGCTCCC 


TCGTGCGCTC 


TCCTGTTCCG 


18120 


20 


ACCCTGCCGC 


TTACCGGATA 


CCTGTCCGCC 


TTTCTCCCTT 


CGGGAAGCGT 


GGCGCTTTCT 


18180 




CATAGCTCAC 


GCTGTTGGTA 


TCTCAGTTCG 


GTGTAGGTCG 


TTCGCTCCAA 


GCTGGGCTGT 


18240 




GTGCACGAAC 


CCCCCGTTCA 


GCCCGACCGC 


TGCGCCTTAT 


CCGGTAACTA 


TCGTCTTGAG 


18300 




TCCAACCCGG 


TAAGG CACG C 


CTTAACGCCA 


CTGGCAGCAG 


CCACTGGTAA 


CCGGATTAGC 


18360 




AGAGCGATGA 


TGGCACAAAC 


GGTGCTACAG 


AGTTCTTGAA 


GTAGTGGCCC 


GACTACGGCT 


18420 


25 


ACACTAGAAG 


GACAGTATTT 


GGTATCTGCG 


CTCTGCTGAA 


GCCAGTTACC 


TTCGGAAAAA 


18480 




GAGTTGGTAG 


CTCTTGATCC 


G G C AAAC AAA 


CCACCGTTGG 


TAGCGGTGGT 


TTTTTTGTTT 


18540 




GCAAGCAGCA 


GATTACGCGC 


AGAAAAAAAG 


GATCTCAAGA 


AGATCCTTTA 


ATCTTTTCTA 


18600 




CTGAACCGCG 


ATCCCCGTCA 


GTTTAGAAGA 


GGAGGATGGT 


GCGATGGTCC 


CTCCCTGAAC 


18660 




ATCAGGTATA 


TAGTTAGCCT 


GACATCCAAC 


AAGGAGGTTT 


ATCGCGAATA 


TT C C CAC AAA 


18720 


30 


AAATCTTTTC 


CTCATAACTC 


GATCCTTATA 


AAATGAAAAG 


AATATATGGC 


GAGGTTTAAT 


18780 




TTATG AG CTT 


AAGATACTAC 


ATAAAAAATA 


TTTTATTTGG 


CCTGTACTGC 


ACACTTATAT 


18840 




AT AT ATAC CT 


TATAACAAAA 


AACAGCGAAG 


GGTATTATTT 


CCTTGTGTCA 


GATAAGATGC 


18900 




TATATGCAAT 


AGTGATAAGC 


ACTATTCTAT 


GTCCATATTC 


AAAATATGCT 


ATTGAATACA 


18960 




TAGCTTTTAA 


CTT C ATAAAG 


AAAGATTTTT 


TCGAAAGAAG 


AAAAAACCTA 


AATAACGCCC 


19020 


35 


CCGTAGCAAA 


ATTAAACCTA 


TTTATGCTAT 


ATAATCTACT 


TTGTTTGGTC 


CTAGCAATCC 


19080 




CATTTGGATT 


GCTAGGACTT 


TTTATATCAA 


TAAAGAATAA 


TTAAATC C CT 


AACACCTCAT 


19140 




TTATAGTATT 


AAGTTTATTC 


TTATCAATAT 


AGGAGCATAG 


AA 




19182 
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1. A system for transposing a transposable DNA sequence 
In vitro, the system comprising: 

a Tn5 transposase modified relative to a wild type Tn5 
transposase, the modified transposase comprising a change 
relative to the wild type Tn5 transposase that causes the 
modified transposase to bind to Tn5 outside end repeat 
sequences with greater avidity than the wild type Tn5 
transposase, and a change relative to the wild type Tn5 
transposase that causes the modified transposase to be less 
likely than the wild type transposase to assume an inactive 
multimeric form; 

a donor DNA molecule comprising the transposable DNA 
sequence, the DNA sequence being flanked at its 5'- and 3 ' -ends 
by the Tn5 outside end repeat sequences; and 

a target DNA molecule into which the transposable element 
can transpose . 

2. A system as claimed in Claim 1 wherein the change that 
causes the modified transposase to bind with greater avidity is 
characterized as a substitution mutation at position 54 of the 
wild type transposase. 

3. A system as claimed in Claim 2 wherein position 54 is 
a lysine. 

4 . A system as claimed in Claim 1 wherein the change that 
causes the modified transposase to be less likely to assume an 
inactive multimeric form is characterized as a substitution 
mutation at position 372 of the wild type transposase. 

5. A system as claimed in Claim 4 wherein position 372 
is a proline. 

6. A system as claimed in Claim 1 wherein the modified 
transposase further comprises a substitution mutation at 
position 56 of the wild type transposase. 
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7. A system as claimed in Claim 6 wherein position 56 is 
an alanine. 

8. A system as claimed in Claim 1 wherein the donor DNA 
molecule is flanked at its 5'- and 3 1 -ends by an 18 or 19 base 
pair flanking DNA sequence comprising nucleotide A at position 
10, nucleotide T at position 11, and nucleotide A at position 
12 . 

9. The system as claimed in Claim 8 wherein the flanking 
sequence further comprises a nucleotide at position 4 selected 
from the group consisting of A or T. 

10. The system as claimed in Claim 8 wherein the flanking 
sequence further comprises a nucleotide at position 15 selected 
from the group consisting of G or C. 

11. The system as claimed in Claim 8 wherein the flanking 
sequence further comprises a nucleotide at position 17 selected 
from the group consisting of A or T. 

12. The system as claimed in Claim 8 wherein the flanking 
sequence further comprises a nucleotide at position 18 selected 
from the group consisting of G or C. 

13 . The system as claimed in Claim 8 wherein the flanking 
sequence has the sequence 5 ' -CTGTCTCTTATACACATCT- 3 1 . 

14. The system as claimed in Claim 8 wherein the flanking 
sequence has the sequence 5 1 -CTGTCTCTTATACAGATCT-3 1 . 
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15. A Tn5 transposase modified relative to a wild type 
Tn5 transposase, the modified transposase comprising: 

a change relative to the wild type Tn5 transposase that 
causes the modified transposase to bind to Tn5 outside end 
repeat sequences of a donor DNA with greater avidity than the 
wild type Tn5 transposase; and 

a change relative to the wild type Tn5 transposase that 
causes the modified transposase to be less likely than the wild 
type transposase to assume an inactive multimeric form. 

16. A modified Tn5 transposase as claimed in Claim 15 
wherein the change that causes the modified transposase to bind 
with greater avidity is characterized as a substitution 
mutation at position 54 of the wild type transposase. 

17. A modified Tn5 transposase as claimed in Claim 16 
wherein position 54 is a lysine. 

18. A modified Tn5 transposase as claimed in Claim 15 
wherein the change that causes the modified transposase to be 
less likely to assume an inactive multimeric form is 
characterized as a substitution mutation at position 372 of the 
wild type transposase. 

19. A modified Tn5 transposase as claimed in Claim 18 
wherein position 372 is a proline. 

20. A modified Tn5 transposase as claimed in Claim 15 
further comprising a substitution mutation at position 56 of 
the wild type transposase. 

21. A modified Tn5 transposase as claimed in Claim 20 
wherein position 56 is alanine. 

22. A genetic construct comprising a nucleotide sequence 
that can encode a Tn5 transposase that both has greater avidity 
for Tn5 outside end repeats and is less likely to assume an 
inactive multimeric form than a wild type Tn5 transposase. 
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23. A genetic construct as claimed in Claim 22 comprising 
a nucleotide sequence that encodes a lysine residue at amino 
acid 54 of the transposase. 

24. A genetic construct as claimed in Claim 22 comprising 
a nucleotide sequence that encodes a proline residue at amino 
acid 372 of the transposase. 

25. A genetic construct as claimed in Claim 22 comprising 
a nucleotide sequence that encodes a lysine residue at amino 
acid 54 of the transposase and a proline residue at amino acid 
372 of the transposase. 

26. A genetic construct as claimed in Claim 22 comprising 
the nucleotide sequence of SEQ ID N0:1. 

27. A genetic construct comprising: 

a transposable DNA sequence flanked at its 5 1 and 3 f ends 
by an 18 or 19 base pair flanking DNA sequence comprising 
nucleotide A at position 10, nucleotide T at position 11, and 
nucleotide A at position 12. 

28. The construct of Claim 27 further comprising, at 
position 4 of the flanking sequence, a nucleotide selected from 
the group consisting of T or A. 

29. The construct of Claim 27 further comprising, at 
position 15 of the flanking sequence, a nucleotide selected 
from the group consisting of G or C. 

30. The construct of Claim 27 further comprising, at 
position 17 of the flanking sequence, a nucleotide selected 
from the group consisting of T or A. 

31. The construct of Claim 27 further comprising, at 
position 18 of the flanking sequence, a nucleotide selected 
from the group consisting of G or C. 
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32. The construct as claimed in Claim 27 wherein the 
flanking sequence has the sequence 5 ' -CTGTCTCTTATACACATCT-3 1 . 

33. The construct as claimed in Claim 27 wherein the 
flanking sequence has the sequence 5 ' -CTGTCTCTTATACAGATCT- 3 ' . 

34. A method for in vitro transposition, the method 
comprising the steps of : 

combining a donor DNA molecule that comprises a 
transposable DNA sequence of interest, the DNA sequence of 
interest being flanked at its 5'- and 3 ' -ends by Tn5 outside 
end repeat sequences, with a target DNA molecule and a Tn5 
transposase modified relative to wild type Tn5 transposase in a 
suitable reaction buffer at a temperature below a physiological 
temperature until the modified transposase binds to the outside 
end repeat sequences; and 

raising the temperature to a physiological temperature for 
a period of time sufficient for the enzyme to catalyze in vitro 
transposition, 

wherein the modified transposase comprises a change 
relative to the wild type Tn5 transposase that causes the 
modified transposase to bind to the Tn5 outside end repeat 
sequences with greater avidity than the wild type Tn5 
transposase, and a change relative to the wild type Tn5 
transposase that causes the modified transposase to be less 
likely than the wild type transposase to assume an inactive 
multimeric form. 

35. A method as claimed in Claim 34 wherein the change 
that causes the modified transposase to bind with greater 
avidity is characterized as a substitution mutation at position 
54 of the wild type transposase. 

36. A method as claimed in Claim 35 wherein position 54 
is a lysine. 
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37. A method as claimed in Claim 34 wherein the change 
that causes the modified transposase to be less likely to 
assume an inactive multimeric form is characterized as a 
substitution mutation at position 372 of the wild type 
transposase . 

38. A method as claimed in Claim 37 wherein position 372 
is a proline. 

39. A method as claimed in Claim 34 wherein the modified 
transposase further comprises a substitution mutation at 
position 56 of the wild type transposase. 

40. A method as claimed in Claim 3 9 wherein position 56 
is an alanine. 



41. A method as claimed in Claim 34 wherein the DNA 
sequence of interest is flanked at its 5'- and 3 ' -ends by an 18 
or 19 base pair flanking DNA sequence comprising nucleotide A 
at position 10, nucleotide T at position 11, and nucleotide A 
at position 12. 



42. The method as claimed in Claim 41 wherein the 
flanking sequence further comprises a nucleotide at position 4 
selected from the group consisting of A or T. 

43. The method as claimed in Claim 41 wherein the 
flanking sequence further comprises a nucleotide at position 15 
selected from the group consisting of G or C. 

44. The method as claimed in Claim 41 wherein the 
flanking sequence further comprises a nucleotide at position 17 
selected from the group consisting of A or T. 

45. The method as claimed in Claim 41 wherein the 
flanking sequence further comprises a nucleotide at position 18 
selected from the group consisting of G or C. 
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46. The method as claimed in Claim 41 wherein the 
flanking sequence has the sequence 5 ' - CTGTCTCTTATACACATCT - 3 ' . 

47. The method as claimed in Claim 41 wherein the 
flanking sequence has the sequence 5 1 - CTGTCTCTTATACAGATCT - 3 ' . 
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b 

Papulation of IE Mutants with EK54 Tnp 
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Papulation of IE Mutants with wt Tnp 
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